This article echoes my experience as well. I was working on some core NLP models for a larger tech company and wanted to experiment with Keras. I had my models designed within a day and training done within another and had amazing model perf.
I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't. In fact I would argue that most workloads around the world are not Google scale and neither are most Google workloads.
This attitude of "real deep learning engineers use Tensorflow" is an unhelpful way of saying "I agree that the API is unreadable but I've invested so much time in the ecosystem that I'll refuse to see its usability problems". Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.
> Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.
The problem with TensorFlow is mainly that you, as a user, have to build a data-dependency graph. This is something a C compiler can do very well, but Python is not so suitable for that.
So, in my view, TensorFlow chose the wrong substrate for their "more efficient" library. Instead, they should have developed their own language, where the whole data-flow graph determination could be implicit, and not a concern for the programmer.
However, computing a data-flow graph as-you-go (by the library, not the user), like (I think) is done in some libraries, is quite a good approach, since the overhead is quite small (percentage-wise) compared to the large tensor operations that can be performed in highly optimized code.
> So, in my view, TensorFlow chose the wrong substrate for their "more efficient" library. Instead, they should have developed their own language, where the whole data-flow graph determination could be implicit, and not a concern for the programmer.
Well, apparently :) They say: We believe that machine learning tools are so important that they deserve a first-class language and a compiler.
However, I'd like to see some numbers on how more efficient it is to build a graph in advance, given that the lion's share of the computations will be in tensor math anyway (which can be heavily optimized, and is independent of the graph).
The problem is that if you build the graph as-you-go, dataflow graph optimizations cannot be done efficiently (some high level optimizations such as data layout optimization, automatic data / model parallelism etc.). Swift can do all these because the compiler can extract the graph out ahead of time.
Or, rather, it is hard but the difficulty is from getting an intuition for what part of this weird multi layer net is producing this weird behavior and is it an artefact or something interesting, and is the connectivity complete and is should I change the learning rate and activation functions?
The real reason to use Tensorflow is the same reason you might use a Go framework instead of Rails: in your heart you have this hope that this thing will one day grow into a really large project and support lots of people and that will be easier with this scalable, optimized code.
Its not even that you'll hit Google scale, its that you'll hit popular scale and still serve the whole thing out of your Digital Ocean droplet.
"Its not even that you'll hit Google scale, its that you'll hit popular scale and still serve the whole thing out of your Digital Ocean droplet."
Are you saying that model inference is slower or less efficient for a model built and trained in Keras, than the same model architecture built directly in tensorflow?
Actually, with Tensorflow as a Keras backend, I would expect them to be the same. I am not sure where the performance difference between TF and TF as a backend come from.
I do think that pure TF would be easier to scale up over multiple servers etc. but that's only because I don't know how it would work in Keras. Maybe its easy.
I would think the difference would be from the data input pipeline, efficiency in batching, updating online models. The inference itself would be the exact same.
I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't.
Use the right tool for the job. Keras can get you to a working model faster. However, I am not sure what the current situation is, but in the past it was not possible to dump and freeze Keras' Tensorflow graphs. This can be a problem if you want to embed a model in a non-Python application.
This attitude of "real deep learning engineers use Tensorflow"
Real engineers use whatever they need to use. But I think that you are overstating the difficulty of Tensorflow. Over the last 6 months, we have hired a couple of students for a research project. Since we standardized on Tensorflow, they had to implement new models in Tensorflow. All of them were up to speed in Tensorflow pretty quickly (they mostly do RNNs and seq2seq learning).
You can get a direct reference to the graphs if you want, that will let you do anything tensorflow lets you do. I think this is what you want:
# This assumes your model is ready to be called with .predict()
sess = keras.get_session()
graph = sess.graph
graph_dev = graph.as_graph_def()
frozen_graph = tf.graph_util.convert_variables_to_constants(
sess, graph_def, nodes_to_output)
encoded_frozen_graph = frozen_graph.SerializeToString()
Even better... use Keras' MxNet backend. Training is ~30% faster, you get multi-GPU for free, and you can perform inference in MxNet easily.
Not to mention you can more easily use channels-first data, quantize to FP16/INT8 more easily, and export to ONNX for use w/ Tensor-RT and/or Intel Nervana.
> in the past it was not possible to dump and freeze Keras' Tensorflow graphs.
This was never true.
There was no obvious Keras API for this, but you could build a model with the Keras API, then use the TF API to save it. The inference API would be the TF API (i.e. you'd need to find the names of all your input and output tensors and use those with Session.run).
Except that this was true. I do not remember the exact details, because this was the end of 2015 or beginning of 2016. But dumping/freezing definitely failed on some graphs constructed with Keras.
There was no obvious Keras API for this, but you could build a model with the Keras API, then use the TF API to save it.
That was easy to figure out. Read the backend implementation and you can see how you can get the graph definition, etc.
Try to run multiple models/ensemble training on many computers with many GPUs to pick up the best performing model or combo. TensorFlow so far has probably the easiest approach for it. That might be reason for the attitude "real deep learning engineers use Tensorflow", as other approaches either don't scale that well or you can't even model something you need for your bleeding-edge billion $-making approach, despite other frameworks being much much simpler/more natural and a joy to use.
Most of the tools that TensorFlow offers for multi-gpu and distributed model training will "just work" directly with Keras models too, or with really minor tweaks. You can even easily mix and match pure TensorFlow code (like explicitly setting the device with a device placement context manager) with Keras code.
See e.g. [0] and [1] linked below.
For model ensembling, it's even easier. After training, in Keras you could simply load your multiple models and create a new Model() object that does nothing but use a merge layer (with mode set to averaging) to average across multiple input models, even if the models share layers or have other crazy constraints. Writing that final ensemble is extremely easy in Keras.
In my experience researching and productionizing very deep Keras models for an image processing use case that has moderately tight performance constraints, Keras has proved to scale extremely well and the code remains dead simple the whole time.
Thanks for the links! Do you know how to convert a generator from Keras to an input in estimator, add class weights, custom loss functions, plug-in various Keras-based callbacks as well? I couldn't find any guide for that part.
What do you use to orchestrate distributed training in Keras?
> "how to convert a generator from Keras to an input in estimator"
This is a bit of a mistaken question, because you would not "convert" a DataGenerator into an estimator input. Instead, you can just wrap the DataGenerator in a simple function that lazily outputs the next batch of training examples. Input functions for Estimators are just functions that accept no arguments and produce a 2-tuple, with first component of a dictionary of named inputs and second component of the target value. You can write your own wrapper functions that consumes from a DataGenerator and normalized the output to the format. I'm sure there will be a helper function to do this automatically in the future, but it's about as easy as can be to just wrap with a function anyway.
> "add class weights, custom loss functions"
This too seems mistaken, because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator. You can use whatever you want for this and the Keras Model.compile function accepts dictionaries for loss and loss_weights, as well as custom add_loss usage in your own layers (even pass through layers that don't affect the computation graph).
> "plug-in various Keras-based callbacks as well"
This is admittedly slightly harder, but I think it's a little bit of an unfair question because Keras offers far more functionality in its Callbacks than TensorFlow offers with predefined hooks. "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.
Either way, this is also not too hard. For any Callback you want to use from Keras, you basically just write a tiny wrapper class that subclasses from session_run_hook.SessionRunHook from tensorflow, and then maps the TensorFlow naming conventions, like "begin" or "before_run" etc., to wrap the equivalent method from the Keras callback, like "on_train_begin", or "on_epoch_end".
The bigger point is that this headache is because of TensorFlow. Both because TF chose a really silly class design for the SessionRunHooks thing, making automatic conversion from Keras (which has the more established set of pre-existing callbacks) harder for no good reason, and also because TensorFlow lacks functionality that Keras gives you for free.
For orchestration, my team just uses a simple GPU cluster where the native device placement primitives with TensorFlow allow us to scale to as many GPUs as we've needed (max in the dozens).
For distributing and orchestrating over larger clusters, Keras provides some good alternatives right on its own FAQ page:
In the end, I would not claim you can immediately translate every complex feature of Keras, like deep custom callbacks or something, over to TensorFlow ... but that's usually not a big deal. Most times, you just want to port a fairly standard model to the Estimator API, and for this, it "just works" directly and is easy to use for local, small-ish clusters of GPUs.
When you have a much rarer problem that needs a huge GPU cluster, then use the other suggests like dist-keras or Horovod, or write your own simple map-reduce-ish wrapper to put data on different nodes and deploy e.g. a containerized training application.
Also people need to definitely keep in mind that most of the limitations are TensorFlow's own fault for not designing things to be compatible with heavily used Keras features like Callbacks out of the box. TensorFlow has a history of doing this, and has been very developer-unfriendly in this way even when it has no downside or impact on performance or anything. The core TensorFlow designs suffer from an unfortunate "not invented here" kind of philosophy, even when dealing with Keras.
> Instead, you can just wrap the DataGenerator in a simple function that lazily outputs the next batch of training examples.
You probably know those simple generators aren't recommended to be used by Keras, instead keras.utils.Sequence is preferred due to (Keras doc): "Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators."
I couldn't see any equivalent of this for estimators, sadly, and wrapping it up in a naive generator seemed like a functionality downgrade.
> because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator
Right, you specify a loss function before compiling, however if it is a custom one and you for some reason need to reload a model snapshot (i.e. resuming training), you need to provide it separately or the loading fails. I haven't found any docs on this. Imagine your training optimizer automatically generating loss functions by means of function composition, e.g. you put a mix of +-*/,log,exp,tanh etc. based off some past training experience of what helped in individual cases/literature, then taking 1000s of these loss functions and pushing them to a large cluster where they are scored on how well did they perform, keeping only the best performing ones.
Class weights are specified in fit_generator(), not in compile time; again, here I couldn't find any description on how to convert Keras' weight dictionary to what TensorFlow needs.
> Callbacks... "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.
The thing here is that some of those callbacks are mandatory for a training to converge, e.g. decreasing learning rate, escaping plateau situations, computing various stats that aren't provided by Keras (outside loss/accuracy; you might want F1, Fleiss/Cohen's Kappa, Matthews correlation coefficient, AUC ROC etc.) that might be decisive for keeping/discarding a model; then also multi GPU callbacks; some people even use callbacks to perform the whole distributed computation as well. In my examples, if I remove any of those callbacks, my models won't achieve any kind of usability but with those callbacks I match world-class results. I couldn't find any non-insane way to map them to TensorFlow prior to our conversation.
As I mentioned, I have a very large cluster, each node with multiple GPUs, so I need an orchestration on both hyperparameters/loss functions per node as well as within each node to run on multiple GPUs.
The page from Keras you mentioned was precisely my starting point and from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark).
I'll take a deeper look into SessionRunHook you mentioned - thanks! ;-)
For class weights, the easiest thing is to just generate that as another one of the items placed into the input function dictionary, e.g. when you wrap the DataGenerator. Then have a custom loss function that takes this input element and applies the weight for that training sample. Again, the need to do slight extra work is a limitation of TensorFlow here, not of Keras, but because Keras is so flexible, it's super easy to work around.
> "computing various stats that aren't provided by Keras.."
It seems like you have this backward. Keras provides the easy interface to create the custom callbacks. That's why you can create extra convergence metrics, etc., that are far harder to use if implementing in pure TensorFlow. The part where TensorFlow is specifically lacking functionality is in its ability to handle these callbacks (both pre-built in Keras or user-defined). I've had good success with the solution I mentioned with SessionRunHooks, but still, it is a terrible design choice by the TensorFlow people to create this in a way that is not directly compatible with all the work Keras had done.
> "from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark)."
Just based on how poorly designed the tf.Estimator API is though, I'm not actually sure the other methods would require less devops or less investment. In some cases for standard models, yes. But if you've already committed to using Keras for very customized situations, then going back to the dark ages with native TensorFlow will often be much more work and more error prone than using the other solutions. The Horovod dependence on MPI in particular is fairly simple and needs little management. Most people having done ML / stats PhDs will already have managed far more difficult situations with MPI previously anyway, or at least have the Linux skills needed. The point is you have a fighting chance, whereas deciphering undocumented and badly designed corners of TensorFlow often leaves you with no fighting chance.
From my opinion: Getting started with Tensor Flow, and having a model designed within a day and training within another is also possible.
This mostly depends on your model and your data, and (imho) not on the framework of choice.
For all, Keras/PyTorch/Tensorflow, you'll need to learn the API - but if you have any ML background, that should be straight forward.
Yes, for all practical problems data is the biggest challenge.
Though, debugging matters. In TF it is easy to get errors and spend a lot of time searching for them. In PyTorch it is straightforward. It matters the most when the network, or cost function, is not standard (think: YOLO architecture).
E.g. when I wanted to write some differentiable decision tree it took me way longer in TF (I already knew) than with PyTorch, having its tutorial on another pane.
TensorFlow needs some Deep Learning-based assistant to identify common cause of errors you might see on the AST level. Cryptic errors are its weakness and an AI trained to spot correlations between Python AST and error might be very helpful.
Tensorflow also does not work without needing to build urself for my 2010 or so cpu. So i ended up trying pytorch and i am glad i did. Liking it better than esoteric errors.
I used Keras a few years ago, and really liked it and still recommend it to people, even contributing some code back to it, but I don't think it obviated the need to know TF, since eventually you want to do something that's not in the Keras toolbox, and then you need to understand what it's doing under the hood.
>>I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't. In fact I would argue that most workloads around the world are not Google scale and neither are most Google workloads.
You can convert the Keras model to TF pretty easily if you need to, as long as you use the TF backend. I did this, and converted the string preprocessing in TF so the model could be used in TF serving taking only the string as input.
Yes, they're pretty ugly TBH. All they've done is provide some decent "canned" estimators but for anything custom you're still using the base tensorflow API. Not to mention feeding in something like numpy arrays > 2GB is a huge pain (their Dataset API doesn't fully work).
Author here - the article compares Keras and PyTorch as the first Deep Learning framework to learn. It explores the differences between the two in terms of ease of use, flexibility, debugging experience, popularity, and performance, among others.
If you have experience with learning, or teaching Deep Learning with PyTorch or Keras, we’d love to hear your thoughts about them.
My adviser decided (wisely) that we all needed to learn NN, and we settled on Tensorflow. That went... poorly. I've told this before: the Seq2Seq tutorial was designed for an older version of TF, and it triggered a bug that was not fixed because that way to do Seq2Seq was deprecated and a new tutorial was coming "soon". The "tutorial" was also just a code dump with barely any comments.
Eventually we had new people coming in with even less theoretic background than ours (we had read papers for at least 6 months), and that's when we realised it would not work at all. So we organised a 1-week hackathon with Pytorch, and we've been using it ever since.
Similar story here. I got bitten by that very seq2seq "tutorial", lost a lot of time with it, and haven't used TensorFlow ever since except for reproducing other people's experiments. It's Keras, Torch, DyNet or PyTorch for me.
I agree, I also use Keras for stable complex models (up to 1000 layers) in production and PyTorch for fun (DRL). However, if I want to run a distributed training optimization with minimum setup, whether I like it or not, the simplest way is to use TensorFlow's Estimator model and some pre-baked environment like SageMaker. Horovod or CERNDB/Keras require a bit more setup/devops work. The issue with estimators is that once you start using some bleeding-edge things in Keras, it might be very complicated to translate them back to estimators, despite conversion from Keras model to tf.Estimator being trivial.
Complex computer vision classification tasks based on DenseNet/ResNet approaches; those often could be reduced in depth by some Wide ResNet technique. Keras is super easy there and you get a world-class performance after 1 hour of coding and a week of training, when you know what are you doing.
I mentioned in another comment [0], but also useful here: most of TensorFlow's tools for distributed model training or multi-gpu training will work out of the box directly on Keras, and distributed training is not at all a reason to directly use TensorFlow over Keras. At worst, you have to add in a tiny bit on TensorFlow code on top of the majority being in Keras, but you would still never need to write a significant amount directly in TensorFlow.
I also work on production systems built around deep ResNet architecture for computer vision tasks, and my team does this using solely Keras, including when we do distributed training.
Just adding this thought in case anyone mistakenly thinks you have to start out all-in using only TensorFlow because you might expect to need distributed training at some point.
I appreciate how you focus on just two, which are the state of the art in your opinion.
I find it less useful to see comparisons of "top 50 deep learning frameworks for 2018" which include esoteric stuff that is only there for sake of completeness.
This way a person branching out from Tensorflow (I assume its Tensorflow) knows which two frameworks to try out, and what to look for.
That's what I did for my bachelor's thesis. I didn't take any advanced Maths classes, so I had to learn everything from scratch. Keras helped me to build an intuition for neural networks and made me more interested in learning about the formulas and how it works with TensorFlow in the background. I really enjoyed learning with this top-down approach.
Love Keras. If you like, you can also use the Keras API inside TensorFlow (as tf.keras). We recently published this guide w/ more info - https://www.tensorflow.org/versions/r1.9/programmers_guide/k... - and are working on a few more examples for the v1.9 release in a couple weeks.
Nice article, and I agree with the explanations of what makes Keras and TensorFlow best for specific use cases. Some history: I have used TensorFlow for years, switched to coding against the Keras APIs about 8 months ago. I wish I had more experience with PyTorch, but I just have the time right now to do more than just play with it.
One suggestion to the authors: the benchmark figures are interesting, but I wish you had shown CPU only results also. At work, I have all the GPU resources I need but for my home projects, which are all NLP deep learning experiments, I usually rent a many core large memory server with no GPUs (GPUs seem to speed up RNNs less than other model types).
For most applications you can probably use a TCN (temporal convolutional network) instead of LSTM. TCN's are implemented in all major frameworks and work an order of magnitude faster because they are parallel.
No, TCN is similar to WaveNet (dilated convolutions + masking the future + residual connections). It's a plain convnet, not an LSTM with a twist. That's why it runs efficiently in parallel on GPUs, like image processing convnets.
Actually, yes, the QRNN has all of those features.
First figure from our paper: how the LSTM with a twist allows for the equivalent speed of a plain convnet by running efficiently in parallel on GPUs, like image processing convents.[1]
Best of all, as it's only an "LSTM with (these) twists", it's drop-in compatible with existing LSTMs but can get you a 2-17 times speed-up over NVIDIA's cuDNN LSTM - essentially speed equivalent to the TCN or WaveNet speed-up.
That's why Baidu implemented QRNN in their production Deep Voice 2 neural text-to-speech (TTS) system[3].
This isn't to say TCN or QRNN is better, simply that it's dangerous to flat out say _no_ if you're not actually certain or don't correctly recall the underlying information.
Disclaimer: I'm the co-author of the QRNN.
Double disclaimer: The TCN paper cites the QRNN but decides not to test against it. They also show results over one of my datasets.
I just like Tensorflow better. For building new models, the graph is complex and errors are unavoidable. There is a separate compile time for Tensorflow and errors will be found before the data come in. Tried pytorch before, the error messages are usually not helpful at all and often leads to clueless debugging for hours.
For trying out deep learning, or build on existing models, pytorch or keras may be easier to grasp. But when making new models that involves a lot of math, the Theano/Tensorflow is more helpful IMO.
It is an interesting perspective, but my experience is exactly the opposite. In Theano debugging was awful. TF felt like a breeze until it didn't. When I jumped on PyTorch - it TF started feeling confusing by comparison. Errors exactly in the defective lines, possibility to print everywhere (or using any other kind of feedback / logging intermediate results).
For using models it may note matter that much (though, again read YOLO in TF and PyTorch and then decide which is cleaner :)).
For new models which go beyond a standard ConvNet/LSTM... well, PyTorch is heaven, Theano sounds like a torture.
I am not sure what is your programming style in Pytorch. As people recommended and in most tutorials I see the sequential approach, where a small mistake of data preprocessing would lead to clueless errors in a completely irrelevant line.
YOLO is a quite standard feed-forward model in my opinion. I mean the math part, which I am more concerned with.
I have never used Theano before, my idea from it is that Tensorflow followed its static graph approach.
If a model involves a lot of math, is it more helpful to be able to debug it? What tf lacks is an intuitive debugging tool. I think this is where pytorch excels.
You don't often need to debug, especially if the model can be checked on compile time. Think static types for programming. For Tensorflow all the data types and tensor dimensions are checked before loading any data, it the math is derived correctly then it is not necessary to even debug.
If data and model are mixed, it often resort to line-by-line debugging to zone out the real problem, which often takes more time.
I don't do much ML (this kind of ML at least), so I know I'm not the target audience for these libraries. But I'd wish they were written in some language with static typing for IDE help. The API interface/tweaks-to-be-done for some of them is enormous, and mostly undiscoverable.
I mean, just looking at the "getting started, 30 seconds to Keras"[0], there are so many magic strings and options. Of course, if one is well-versed in this domain, they make sense. But it's hard to grasp, and Keras is supposed to be the high-level one.
I've been recently following the course over at fast.ai and I'm having the same issue. The API is, as you mentioned, undiscoverable, and I don't really want to look at the source code, which is kind of unreadable anyway ([0]). The course itself is littered with poorly readable code ([1]) such as:
to_np(m.ib(V(topMovieIdx)))
Why, just why.
Despite all this, I wholeheartedly recommend this course, it demystified DL for me.
I kind of agree. I find the way the fastai library (which builds on PyTorch) is written does not match my personal preferences very well. That being said I still really enjoy working with it and the courses do guide one along quite well. I'd actually recommend it for anyone who just wants to get started and play around. It's fairly easy to get good results on a gaming box (say 1080 GTX) reasonably quickly. I also like the Jupyter Notebook approach that they use for the lectures. It encourages experimenting around and they also encourage you to dive into the library and read source code which is good. Alas I find the terse style complicates it a bit but that may very well be personal preference. It's also good to know that they try to implement interesting papers quickly.
I think overall if your goal is to use a preexisting architecture to get quick results fastai is a great point to start. If you want to build your own architecture, reach one level of abstraction lower.
Edit: I liked this PyTorch youtube series quite a bit: https://www.youtube.com/watch?list=PLlMkM4tgfjnJ3I-dbhO9JTw7...
The terseness is intentional[0], following the idea that "brevity facilitates reasoning". I don't 100% agree with it, but at least there's reasoning behind it instead of just laziness.
I fully agree with you. While I can't speak for Keras, after reading [1] it seems Tensorflow would very much benefit from strong types. You'd easily be able to catch most of the errors presented in the article with higher-kinded and linear types, and maybe you don't even need that much power.
I totally agree, the lack of documentation via types and lack of smart autocomplete (which I rely on very heavily for API discovery) is not only why I never got into tensor flow, it's also why I never got into languages like python. I even went so far as to use typescript instead of Javascript.
I do believe c# has some machine learning libraries, but afaik they aren't anywhere near the level of tensor flow or keras.
Scala and Java both have access to really great ML libraries, MLLib in particular is supposed to be really good. I've used it a little but I'm not really a ML expert to judge.
I don't believe I would ever discourage anyone from using any particular framework. The skills learnt from one are highly transferable, so it doesn't matter too much which framework you start with.
Also, with eager execution, Tensorflow has become much more accessible to new users.
Having said that, the world would likely be a better place if everyone just used PyTorch :)
I don't know about how many people external to Google know about tf.estimator, but it's where most people who aren't building complicated custom architectures should be starting. Keras is nice, it's easy to use, but I wouldn't use it to design build and run a massive productive pipeline. tf.estimator is just that.
BTW (from author - of blog post, and this library): for super-simple live training plots in Jupyter Notebook for Keras (and PyTorch): https://github.com/stared/livelossplot
For more advanced training for business or Kaggle competitions (version controlling of code and results, advanced charts): https://neptune.ml/
Having used both plain TensorFlow and Keras for some very large image processing production services, Keras wins easily, and interoperates with sprinkling in low-level TensorFlow very well.
Even defining a custom deep CNN for multiple image prediction tasks (so, deep and custom architecture), Keras holds up well — and creating your own layers in Keras is very easy.
Having used Torch (the Lua library) before, the comparison between the Sequential models seems very absurd. Even the pyTorch documentation gives an almost equivalent model defintion method:
# Example of using Sequential
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)
These sequential models are are like Fibonacci function comparisons between programming languages.
They are simple and basic, difference between 5 lines of code or 20 lines of code makes no difference. You spend very little time actually coding these layers. Understanding the model, default parameters used underneath is more important.
It would be nice to see some examples with skip-layers, weight sharing etc. You you have to drop sequential model to do them or not?
(Another author here).
It is explicitly explained in the text. :)
tl;dr: not nearly as popular (which means: less tutorials, less documentation, less examples, less integration with other systems, less community support for development or discussions)
Sure, all frameworks do have some goal and once one is confident in DL, may be a good choice. As you see from the plots there - MXNet is very fast for some applications.
See charts. Likely, but still 30% speed boost is not a factor for someone learning DL (then debugging, or training wrong models, can easily give overhead of 5-20x).
Interesting article. I've been doing some production work with ML and I was wondering which tool would work better in my specific enviroment.
Currently I've been training a CNN model in Keras with good success, and using custom scripts to port it to a TensorFlow model. The .h5 file from Keras helps a lot with this step.
Next step is compiling a shared Tensorflow library so I can deploy the trained model in C++ (project requirement) and this has been a pain in the ass, regardless of framework...
Pytorch has the best API to understand deep learning and Pytorch based Pyro is also very good for probabilistic programming (fresh take coming from Stan/PyMC3)
Speed is one thing, but the key value proposition of Keras for me that rarely comes up in these comparisons are Keras’s native utility functions, including easy and correct text tokenization/padding, easy OHE of categorical variables without using sklearn, and easy model saving/loading from an .hdf5 file. (Although I am not an expert on PyTorch and not as familiar with the ETL pipeline for that)
I agree strongly with the ease of model saving / reloading in Keras. I found this basic functionality to be exasperatingly difficult and cumbersome in TensorFlow.
For nn's in my experience out of memory, and preprocessing tends to cause an equal number issues as the nn optimization. Which tfrecords and streaming seem to solve. Are there similar object loading facilities in pytorch? Though I have not specified models in keras, since it is now part of tf i presume the formats are compatible.
I was able to build a deep learning OCR using CNN from scratch using Keras and runnning in an App using iOS’s coreml in 2 months without prior experience. Hard part was actually getting the data set great. 80% of the time was data massaging. Keras saved me some time. Although the results wouldn’t be world class.
Can we please please please not have the kind of framework overproliferation and fragmentation in the Deep ML world that they have in the front-end web world? It's hard enough to learn ML concepts without also having to learn a new ML framework every year.
I would say keras or mxnet for speed and production. PyTorch for research. By this point there are hardly any cases when it’s worth it to descend to lower TensorFlow levels.
Keras only offers standard layers but you can implement your own LSTM layer and use it with Keras. This way you can take advantage of all the other features
Learning Python is way faster than learning Deep Learning, so it shouldn't be an issue.
I am planning organizing a TensorFlow.js bootcamp, but here it is more difficult (as data preprocessing, and debugging in general, is way more difficult in JS than in Python).
I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't. In fact I would argue that most workloads around the world are not Google scale and neither are most Google workloads.
This attitude of "real deep learning engineers use Tensorflow" is an unhelpful way of saying "I agree that the API is unreadable but I've invested so much time in the ecosystem that I'll refuse to see its usability problems". Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.