I assume you're talking about AzureML Studio. It's a pretty neat UI-centric tool for building machine learning workflows! It's great if you're starting out with ML, but offers little in terms of customizability. For example, it only supports R and Python, has no GPUs, no CLI, no container support for managing reproducible environments, etc. I think these are kind of deal breakers for doing deep learning :)
FWIW, I worked as a data scientist in Bing for 6 years and haven't seen/heard any other data scientist use it internally. We ended up building our own GPU clusters and going through the regular drill.
Curious how Microsoft's approach compares to what you have heard about Google's approach to ML as a service. My impression is that it looks kinda/sorta like it's internal approach/infrastructure...if you squint.
Which also makes me curious about FloydHub's infrastructure. Any gory details?
Floyd’s Infrastructure runs entirely on Docker. That makes it backend agnostic (our cloud offering currently runs on AWS). Floydhub uses nvidia-docker for the deep learning jobs that require GPU. We also version the entire pipeline (code, data, params and environment) for exact reproducibility.
GPUs instances are really expensive. One of the biggest challenges at the moment is around reducing this cost. Eg. Spot Instances and Spot Blocks. Still some challenges to be solved there.
We also want Floyd to be an end-to-end solution for building, training and deploying deep learning models. In that vein, we are also investing in adding support for Tensorflow serving but it has been a rough ride so far. Getting a generic solution that can host any Tensorflow model has not been straightforward.
ML-as-a-service offered by many companies (Microsoft Cognitive Services, Google Cloud Prediction, IBM Watson, etc.) are fairly similar. They're great out-of-the-box for some domains, say English speech recognition. For others (text/image), they’re fairly easy to get started with (don’t need much training data, no managing infra, etc.) However, they are mostly black boxes and set a slightly low bar in terms of quality. Anyone doing serious AI will hit the limits of what they offer fairly quickly.
The DL community is awesome in its openness and contributions. Our goal with FloydHub, in contrast to the ML APIs, is to provide the tools for data scientists to effectively leverage this. We want to solve the engineering hurdles that come in the way of doing some cool science.
I like the ease of use to get up and running with Floydhub. What internal tools you had to solve this at Microsoft? Were they any good? I heard Facebook has their own FBLearner Flow internally for managing their ML workflows and it's pretty neat.
I’ve heard FBLearner Flow is pretty cool for running/managing/sharing ML pipelines inside Facebook. Never seen or used it myself, but Microsoft had a similar internal tool called AEther that was very cool too. We’ve definitely taken inspiration from AEther in building Floyd.
Here’s an anecdotal story about how awesome AEther was (been a long time, so a little fuzzy on details): In 2011, Harry Shum was the VP of the Bing division at Microsoft. It was the early days of Bing (~10% market share, ~$2bn annual loss, etc.) - we had good talent, but were lagging behind Google in tech. In one of our all-hands meetings, Harry jokingly announced that if we beat Google in our core relevance metric (called NDCG), he’d take the entire Bing team, approx. 300 people strong, for a fully paid trip to Las Vegas.
Sure enough, a year later, Bing did beat Google in our core relevance metric (http://www.insideris.com/microsoft-bing-beats-google-in-the-...) and all 300 of us went to Vegas for a weekend as promised. (Spoiler: Google did eventually beat Bing back later)
The success and rapid acceleration in relevance gains was attributed in large parts to the introduction of a new tool called AEther (in addition to improving ML tech and hiring top talent). AEther was an experimentation platform for building and running data workflows. It allowed data scientists to build complex workflows and experiment in a massively parallel fashion, while abstracting away all the engineering concerns. I used it a ton on a daily basis and loved it. The AEther team claimed that it increased the experimentation productivity of researchers and engineers by almost 100X. Even now, when I ask ex-Bing data scientists working at other companies about what they miss the most from their time at Microsoft, AEther is almost always in the top 3 answers.
Having seen how awesome AEther was from the inside, one of our goals is to bring its benefits to the rest of the world as well. However, having talked to a few individual data scientists and researchers over the last month, their preference seems to be CLI over GUI (while bigger companies like it much better). May be its one of those things you have to get used to, or may be our implementation is clunky. So we’re making the GUI an enterprise only feature for now, while we continue to help individual data scientists through our CLI.
FWIW, I worked as a data scientist in Bing for 6 years and haven't seen/heard any other data scientist use it internally. We ended up building our own GPU clusters and going through the regular drill.