Keras vs PyTorch

formalsystem · on June 28, 2018

This article echoes my experience as well. I was working on some core NLP models for a larger tech company and wanted to experiment with Keras. I had my models designed within a day and training done within another and had amazing model perf.

I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't. In fact I would argue that most workloads around the world are not Google scale and neither are most Google workloads.

This attitude of "real deep learning engineers use Tensorflow" is an unhelpful way of saying "I agree that the API is unreadable but I've invested so much time in the ecosystem that I'll refuse to see its usability problems". Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.

amelius · on June 28, 2018

> Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.

The problem with TensorFlow is mainly that you, as a user, have to build a data-dependency graph. This is something a C compiler can do very well, but Python is not so suitable for that.

So, in my view, TensorFlow chose the wrong substrate for their "more efficient" library. Instead, they should have developed their own language, where the whole data-flow graph determination could be implicit, and not a concern for the programmer.

However, computing a data-flow graph as-you-go (by the library, not the user), like (I think) is done in some libraries, is quite a good approach, since the overhead is quite small (percentage-wise) compared to the large tensor operations that can be performed in highly optimized code.

sanxiyn · on June 28, 2018

> So, in my view, TensorFlow chose the wrong substrate for their "more efficient" library. Instead, they should have developed their own language, where the whole data-flow graph determination could be implicit, and not a concern for the programmer.

You just described Swift for TensorFlow.

amelius · on June 28, 2018

Well, apparently :) They say: We believe that machine learning tools are so important that they deserve a first-class language and a compiler.

However, I'd like to see some numbers on how more efficient it is to build a graph in advance, given that the lion's share of the computations will be in tensor math anyway (which can be heavily optimized, and is independent of the graph).

liuliu · on June 28, 2018

The problem is that if you build the graph as-you-go, dataflow graph optimizations cannot be done efficiently (some high level optimizations such as data layout optimization, automatic data / model parallelism etc.). Swift can do all these because the compiler can extract the graph out ahead of time.

codetrotter · on June 28, 2018

> You just described Swift for TensorFlow

This one? https://www.tensorflow.org/api_docs/swift/

SalimoS · on June 28, 2018

isn't work in progress ? https://github.com/tensorflow/swift/blob/master/docs/WhySwif...

inputcoffee · on June 28, 2018

As others are pointing out, TF isn't that hard.

Or, rather, it is hard but the difficulty is from getting an intuition for what part of this weird multi layer net is producing this weird behavior and is it an artefact or something interesting, and is the connectivity complete and is should I change the learning rate and activation functions?

The real reason to use Tensorflow is the same reason you might use a Go framework instead of Rails: in your heart you have this hope that this thing will one day grow into a really large project and support lots of people and that will be easier with this scalable, optimized code.

Its not even that you'll hit Google scale, its that you'll hit popular scale and still serve the whole thing out of your Digital Ocean droplet.

rahimnathwani · on June 28, 2018

"Its not even that you'll hit Google scale, its that you'll hit popular scale and still serve the whole thing out of your Digital Ocean droplet."

Are you saying that model inference is slower or less efficient for a model built and trained in Keras, than the same model architecture built directly in tensorflow?

inputcoffee · on June 28, 2018

Actually, with Tensorflow as a Keras backend, I would expect them to be the same. I am not sure where the performance difference between TF and TF as a backend come from.

I do think that pure TF would be easier to scale up over multiple servers etc. but that's only because I don't know how it would work in Keras. Maybe its easy.

eanzenberg · on June 28, 2018

Its pretty straightforward to convert a keras model to a tf graph, as long as you used a tf backend in keras.

crucialfelix · on June 28, 2018

I would think the difference would be from the data input pipeline, efficiency in batching, updating online models. The inference itself would be the exact same.

robertAngst · on June 28, 2018

So I'm automating my job on a shitty tiny laptop.

Do you think I'll be able to use DL?

danieldk · on June 28, 2018

I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't.

Use the right tool for the job. Keras can get you to a working model faster. However, I am not sure what the current situation is, but in the past it was not possible to dump and freeze Keras' Tensorflow graphs. This can be a problem if you want to embed a model in a non-Python application.

This attitude of "real deep learning engineers use Tensorflow"

Real engineers use whatever they need to use. But I think that you are overstating the difficulty of Tensorflow. Over the last 6 months, we have hired a couple of students for a research project. Since we standardized on Tensorflow, they had to implement new models in Tensorflow. All of them were up to speed in Tensorflow pretty quickly (they mostly do RNNs and seq2seq learning).

candiodari · on June 28, 2018

> dump and freeze Keras' Tensorflow graphs

You can get a direct reference to the graphs if you want, that will let you do anything tensorflow lets you do. I think this is what you want:

  # This assumes your model is ready to be called with .predict()
  sess = keras.get_session()
  graph = sess.graph
  graph_dev = graph.as_graph_def()

  frozen_graph = tf.graph_util.convert_variables_to_constants(
      sess, graph_def, nodes_to_output)

  encoded_frozen_graph = frozen_graph.SerializeToString()

danieldk · on June 28, 2018

That didn't work before, but admittedly, the last time I tried was probably 1.5 years ago.

alfalfasprout · on June 28, 2018

Even better... use Keras' MxNet backend. Training is ~30% faster, you get multi-GPU for free, and you can perform inference in MxNet easily.

Not to mention you can more easily use channels-first data, quantize to FP16/INT8 more easily, and export to ONNX for use w/ Tensor-RT and/or Intel Nervana.

Eridrus · on June 28, 2018

> in the past it was not possible to dump and freeze Keras' Tensorflow graphs.

This was never true.

There was no obvious Keras API for this, but you could build a model with the Keras API, then use the TF API to save it. The inference API would be the TF API (i.e. you'd need to find the names of all your input and output tensors and use those with Session.run).

danieldk · on June 30, 2018

This was never true.

Except that this was true. I do not remember the exact details, because this was the end of 2015 or beginning of 2016. But dumping/freezing definitely failed on some graphs constructed with Keras.

There was no obvious Keras API for this, but you could build a model with the Keras API, then use the TF API to save it.

That was easy to figure out. Read the backend implementation and you can see how you can get the graph definition, etc.

Ar-Curunir · on June 28, 2018

> Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.

It's funny because this the same attitude C/C++ programmers have towards developers using other languages now...

thewizardofaus · on June 28, 2018

Pffft what are programming languages??? If you aren't writing code in straight up binary then you aren't a real h@k3r

tomrod · on June 28, 2018

I think there is an emacs command to convert between that and butterfly wing flaps directing cosmic rays to flip bits.

(h/t xckd)

Risord · on June 28, 2018

It’s pretty universal thing...

bitL · on June 28, 2018

Try to run multiple models/ensemble training on many computers with many GPUs to pick up the best performing model or combo. TensorFlow so far has probably the easiest approach for it. That might be reason for the attitude "real deep learning engineers use Tensorflow", as other approaches either don't scale that well or you can't even model something you need for your bleeding-edge billion $-making approach, despite other frameworks being much much simpler/more natural and a joy to use.

mlthoughts2018 · on June 28, 2018

Most of the tools that TensorFlow offers for multi-gpu and distributed model training will "just work" directly with Keras models too, or with really minor tweaks. You can even easily mix and match pure TensorFlow code (like explicitly setting the device with a device placement context manager) with Keras code.

See e.g. [0] and [1] linked below.

For model ensembling, it's even easier. After training, in Keras you could simply load your multiple models and create a new Model() object that does nothing but use a merge layer (with mode set to averaging) to average across multiple input models, even if the models share layers or have other crazy constraints. Writing that final ensemble is extremely easy in Keras.

In my experience researching and productionizing very deep Keras models for an image processing use case that has moderately tight performance constraints, Keras has proved to scale extremely well and the code remains dead simple the whole time.

[0]: < https://blog.keras.io/keras-as-a-simplified-interface-to-ten... >

[1]: < https://www.tensorflow.org/programmers_guide/estimators#crea... >

bitL · on June 28, 2018

Thanks for the links! Do you know how to convert a generator from Keras to an input in estimator, add class weights, custom loss functions, plug-in various Keras-based callbacks as well? I couldn't find any guide for that part.

What do you use to orchestrate distributed training in Keras?

mlthoughts2018 · on June 28, 2018

> "how to convert a generator from Keras to an input in estimator"

This is a bit of a mistaken question, because you would not "convert" a DataGenerator into an estimator input. Instead, you can just wrap the DataGenerator in a simple function that lazily outputs the next batch of training examples. Input functions for Estimators are just functions that accept no arguments and produce a 2-tuple, with first component of a dictionary of named inputs and second component of the target value. You can write your own wrapper functions that consumes from a DataGenerator and normalized the output to the format. I'm sure there will be a helper function to do this automatically in the future, but it's about as easy as can be to just wrap with a function anyway.

> "add class weights, custom loss functions"

This too seems mistaken, because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator. You can use whatever you want for this and the Keras Model.compile function accepts dictionaries for loss and loss_weights, as well as custom add_loss usage in your own layers (even pass through layers that don't affect the computation graph).

> "plug-in various Keras-based callbacks as well"

This is admittedly slightly harder, but I think it's a little bit of an unfair question because Keras offers far more functionality in its Callbacks than TensorFlow offers with predefined hooks. "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.

Either way, this is also not too hard. For any Callback you want to use from Keras, you basically just write a tiny wrapper class that subclasses from session_run_hook.SessionRunHook from tensorflow, and then maps the TensorFlow naming conventions, like "begin" or "before_run" etc., to wrap the equivalent method from the Keras callback, like "on_train_begin", or "on_epoch_end".

The bigger point is that this headache is because of TensorFlow. Both because TF chose a really silly class design for the SessionRunHooks thing, making automatic conversion from Keras (which has the more established set of pre-existing callbacks) harder for no good reason, and also because TensorFlow lacks functionality that Keras gives you for free.

For orchestration, my team just uses a simple GPU cluster where the native device placement primitives with TensorFlow allow us to scale to as many GPUs as we've needed (max in the dozens).

For distributing and orchestrating over larger clusters, Keras provides some good alternatives right on its own FAQ page:

< https://keras.io/why-use-keras/#keras-has-strong-multi-gpu-s... >

In the end, I would not claim you can immediately translate every complex feature of Keras, like deep custom callbacks or something, over to TensorFlow ... but that's usually not a big deal. Most times, you just want to port a fairly standard model to the Estimator API, and for this, it "just works" directly and is easy to use for local, small-ish clusters of GPUs.

When you have a much rarer problem that needs a huge GPU cluster, then use the other suggests like dist-keras or Horovod, or write your own simple map-reduce-ish wrapper to put data on different nodes and deploy e.g. a containerized training application.

Also people need to definitely keep in mind that most of the limitations are TensorFlow's own fault for not designing things to be compatible with heavily used Keras features like Callbacks out of the box. TensorFlow has a history of doing this, and has been very developer-unfriendly in this way even when it has no downside or impact on performance or anything. The core TensorFlow designs suffer from an unfortunate "not invented here" kind of philosophy, even when dealing with Keras.

bitL · on June 29, 2018

> Instead, you can just wrap the DataGenerator in a simple function that lazily outputs the next batch of training examples.

You probably know those simple generators aren't recommended to be used by Keras, instead keras.utils.Sequence is preferred due to (Keras doc): "Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators."

I couldn't see any equivalent of this for estimators, sadly, and wrapping it up in a naive generator seemed like a functionality downgrade.

> because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator

Right, you specify a loss function before compiling, however if it is a custom one and you for some reason need to reload a model snapshot (i.e. resuming training), you need to provide it separately or the loading fails. I haven't found any docs on this. Imagine your training optimizer automatically generating loss functions by means of function composition, e.g. you put a mix of +-*/,log,exp,tanh etc. based off some past training experience of what helped in individual cases/literature, then taking 1000s of these loss functions and pushing them to a large cluster where they are scored on how well did they perform, keeping only the best performing ones.

Class weights are specified in fit_generator(), not in compile time; again, here I couldn't find any description on how to convert Keras' weight dictionary to what TensorFlow needs.

> Callbacks... "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.

The thing here is that some of those callbacks are mandatory for a training to converge, e.g. decreasing learning rate, escaping plateau situations, computing various stats that aren't provided by Keras (outside loss/accuracy; you might want F1, Fleiss/Cohen's Kappa, Matthews correlation coefficient, AUC ROC etc.) that might be decisive for keeping/discarding a model; then also multi GPU callbacks; some people even use callbacks to perform the whole distributed computation as well. In my examples, if I remove any of those callbacks, my models won't achieve any kind of usability but with those callbacks I match world-class results. I couldn't find any non-insane way to map them to TensorFlow prior to our conversation.

As I mentioned, I have a very large cluster, each node with multiple GPUs, so I need an orchestration on both hyperparameters/loss functions per node as well as within each node to run on multiple GPUs.

The page from Keras you mentioned was precisely my starting point and from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark).

I'll take a deeper look into SessionRunHook you mentioned - thanks! ;-)

mlthoughts2018 · on June 29, 2018

For class weights, the easiest thing is to just generate that as another one of the items placed into the input function dictionary, e.g. when you wrap the DataGenerator. Then have a custom loss function that takes this input element and applies the weight for that training sample. Again, the need to do slight extra work is a limitation of TensorFlow here, not of Keras, but because Keras is so flexible, it's super easy to work around.

> "computing various stats that aren't provided by Keras.."

It seems like you have this backward. Keras provides the easy interface to create the custom callbacks. That's why you can create extra convergence metrics, etc., that are far harder to use if implementing in pure TensorFlow. The part where TensorFlow is specifically lacking functionality is in its ability to handle these callbacks (both pre-built in Keras or user-defined). I've had good success with the solution I mentioned with SessionRunHooks, but still, it is a terrible design choice by the TensorFlow people to create this in a way that is not directly compatible with all the work Keras had done.

> "from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark)."

Just based on how poorly designed the tf.Estimator API is though, I'm not actually sure the other methods would require less devops or less investment. In some cases for standard models, yes. But if you've already committed to using Keras for very customized situations, then going back to the dark ages with native TensorFlow will often be much more work and more error prone than using the other solutions. The Horovod dependence on MPI in particular is fairly simple and needs little management. Most people having done ML / stats PhDs will already have managed far more difficult situations with MPI previously anyway, or at least have the Linux skills needed. The point is you have a fighting chance, whereas deciphering undocumented and badly designed corners of TensorFlow often leaves you with no fighting chance.

wodenokoto · on June 28, 2018

Funny, I dumped tensor flow a few years ago because it wasn't possible to do bleeding edge stuff.

bitL · on June 28, 2018

Yeah, for many bleeding edge things you still need PyTorch ;-)

riku_iki · on June 28, 2018

Maybe you have a chance to provide example?

hknd · on June 28, 2018

From my opinion: Getting started with Tensor Flow, and having a model designed within a day and training within another is also possible. This mostly depends on your model and your data, and (imho) not on the framework of choice.

For all, Keras/PyTorch/Tensorflow, you'll need to learn the API - but if you have any ML background, that should be straight forward.

stared · on June 28, 2018

Yes, for all practical problems data is the biggest challenge.

Though, debugging matters. In TF it is easy to get errors and spend a lot of time searching for them. In PyTorch it is straightforward. It matters the most when the network, or cost function, is not standard (think: YOLO architecture).

E.g. when I wanted to write some differentiable decision tree it took me way longer in TF (I already knew) than with PyTorch, having its tutorial on another pane.

bitL · on June 28, 2018

TensorFlow needs some Deep Learning-based assistant to identify common cause of errors you might see on the AST level. Cryptic errors are its weakness and an AI trained to spot correlations between Python AST and error might be very helpful.

lord_ring_111 · on June 28, 2018

Tensorflow also does not work without needing to build urself for my 2010 or so cpu. So i ended up trying pytorch and i am glad i did. Liking it better than esoteric errors.

Eridrus · on June 28, 2018

I used Keras a few years ago, and really liked it and still recommend it to people, even contributing some code back to it, but I don't think it obviated the need to know TF, since eventually you want to do something that's not in the Keras toolbox, and then you need to understand what it's doing under the hood.

eanzenberg · on June 28, 2018

>>I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't. In fact I would argue that most workloads around the world are not Google scale and neither are most Google workloads.

You can convert the Keras model to TF pretty easily if you need to, as long as you use the TF backend. I did this, and converted the string preprocessing in TF so the model could be used in TF serving taking only the string as input.

sandGorgon · on June 28, 2018

Did you look at Tensorflow Estimators ? They are a new high-level API with built in support for distributed training.

https://www.tensorflow.org/programmers_guide/estimators

alfalfasprout · on June 28, 2018

Yes, they're pretty ugly TBH. All they've done is provide some decent "canned" estimators but for anything custom you're still using the base tensorflow API. Not to mention feeding in something like numpy arrays > 2GB is a huge pain (their Dataset API doesn't fully work).

sandGorgon · on June 28, 2018

Interesting. So do you recommend Keras+Tf as well, or drop Tensorflow altogether ?

alfalfasprout · on June 29, 2018

Keras + MxNet is far better. It's faster, you get multi-gpu out of the box, and it's reaaaally easy to export to ONNX for fast inference elsewhere.

Raf_ · on June 28, 2018

Author here - the article compares Keras and PyTorch as the first Deep Learning framework to learn. It explores the differences between the two in terms of ease of use, flexibility, debugging experience, popularity, and performance, among others.

If you have experience with learning, or teaching Deep Learning with PyTorch or Keras, we’d love to hear your thoughts about them.

probably_wrong · on June 28, 2018

For what it's worth, here's my experience:

My adviser decided (wisely) that we all needed to learn NN, and we settled on Tensorflow. That went... poorly. I've told this before: the Seq2Seq tutorial was designed for an older version of TF, and it triggered a bug that was not fixed because that way to do Seq2Seq was deprecated and a new tutorial was coming "soon". The "tutorial" was also just a code dump with barely any comments.

Eventually we had new people coming in with even less theoretic background than ours (we had read papers for at least 6 months), and that's when we realised it would not work at all. So we organised a 1-week hackathon with Pytorch, and we've been using it ever since.

Al-Khwarizmi · on June 28, 2018

Similar story here. I got bitten by that very seq2seq "tutorial", lost a lot of time with it, and haven't used TensorFlow ever since except for reproducing other people's experiments. It's Keras, Torch, DyNet or PyTorch for me.

bitL · on June 28, 2018

I agree, I also use Keras for stable complex models (up to 1000 layers) in production and PyTorch for fun (DRL). However, if I want to run a distributed training optimization with minimum setup, whether I like it or not, the simplest way is to use TensorFlow's Estimator model and some pre-baked environment like SageMaker. Horovod or CERNDB/Keras require a bit more setup/devops work. The issue with estimators is that once you start using some bleeding-edge things in Keras, it might be very complicated to translate them back to estimators, despite conversion from Keras model to tf.Estimator being trivial.

jacquesm · on June 28, 2018

> I also use Keras for stable complex models (up to 1000 layers)

That sounds interesting, are you at liberty to say what you are doing?

bitL · on June 28, 2018

Complex computer vision classification tasks based on DenseNet/ResNet approaches; those often could be reduced in depth by some Wide ResNet technique. Keras is super easy there and you get a world-class performance after 1 hour of coding and a week of training, when you know what are you doing.

mlthoughts2018 · on June 28, 2018

I mentioned in another comment [0], but also useful here: most of TensorFlow's tools for distributed model training or multi-gpu training will work out of the box directly on Keras, and distributed training is not at all a reason to directly use TensorFlow over Keras. At worst, you have to add in a tiny bit on TensorFlow code on top of the majority being in Keras, but you would still never need to write a significant amount directly in TensorFlow.

I also work on production systems built around deep ResNet architecture for computer vision tasks, and my team does this using solely Keras, including when we do distributed training.

Just adding this thought in case anyone mistakenly thinks you have to start out all-in using only TensorFlow because you might expect to need distributed training at some point.

[0]: < https://news.ycombinator.com/item?id=17416904 >

inputcoffee · on June 28, 2018

I appreciate how you focus on just two, which are the state of the art in your opinion.

I find it less useful to see comparisons of "top 50 deep learning frameworks for 2018" which include esoteric stuff that is only there for sake of completeness.

This way a person branching out from Tensorflow (I assume its Tensorflow) knows which two frameworks to try out, and what to look for.

entropie · on June 28, 2018

Article timeouts.

stared · on June 28, 2018

It seems to be a HN hug of death.

Though, as I see - it loads, though sometimes with a considerable delays (5-10 sec).

pdyck · on June 28, 2018

That's what I did for my bachelor's thesis. I didn't take any advanced Maths classes, so I had to learn everything from scratch. Keras helped me to build an intuition for neural networks and made me more interested in learning about the formulas and how it works with TensorFlow in the background. I really enjoyed learning with this top-down approach.

jbgordon · on June 28, 2018

Love Keras. If you like, you can also use the Keras API inside TensorFlow (as tf.keras). We recently published this guide w/ more info - https://www.tensorflow.org/versions/r1.9/programmers_guide/k... - and are working on a few more examples for the v1.9 release in a couple weeks.

Optionally, you can also use tf.keras in combination w/ eager execution, enabling you to write code like this: https://colab.research.google.com/github/tensorflow/tensorfl...

Raf_ · on June 29, 2018

Thanks for sharing these guides. Excited about exploring tf.keras with Eager Execution! :)

mark_l_watson · on June 28, 2018

Nice article, and I agree with the explanations of what makes Keras and TensorFlow best for specific use cases. Some history: I have used TensorFlow for years, switched to coding against the Keras APIs about 8 months ago. I wish I had more experience with PyTorch, but I just have the time right now to do more than just play with it.

One suggestion to the authors: the benchmark figures are interesting, but I wish you had shown CPU only results also. At work, I have all the GPU resources I need but for my home projects, which are all NLP deep learning experiments, I usually rent a many core large memory server with no GPUs (GPUs seem to speed up RNNs less than other model types).

visarga · on June 28, 2018

For most applications you can probably use a TCN (temporal convolutional network) instead of LSTM. TCN's are implemented in all major frameworks and work an order of magnitude faster because they are parallel.

https://arxiv.org/abs/1803.01271

https://arxiv.org/abs/1608.08242

droidist2 · on June 28, 2018

I've been using a QRNN in PyTorch, is this similar to a TCN?

https://github.com/salesforce/pytorch-qrnn

visarga · on June 28, 2018

No, TCN is similar to WaveNet (dilated convolutions + masking the future + residual connections). It's a plain convnet, not an LSTM with a twist. That's why it runs efficiently in parallel on GPUs, like image processing convnets.

Smerity · on June 28, 2018

Actually, yes, the QRNN has all of those features.

First figure from our paper: how the LSTM with a twist allows for the equivalent speed of a plain convnet by running efficiently in parallel on GPUs, like image processing convents.[1]

Best of all, as it's only an "LSTM with (these) twists", it's drop-in compatible with existing LSTMs but can get you a 2-17 times speed-up over NVIDIA's cuDNN LSTM - essentially speed equivalent to the TCN or WaveNet speed-up.

That's why Baidu implemented QRNN in their production Deep Voice 2 neural text-to-speech (TTS) system[3].

This isn't to say TCN or QRNN is better, simply that it's dangerous to flat out say _no_ if you're not actually certain or don't correctly recall the underlying information.

Disclaimer: I'm the co-author of the QRNN.

Double disclaimer: The TCN paper cites the QRNN but decides not to test against it. They also show results over one of my datasets.

[1]: https://www.semanticscholar.org/paper/Quasi-Recurrent-Neural...

[2]: https://github.com/salesforce/pytorch-qrnn

[3]: https://www.semanticscholar.org/paper/Deep-Voice-2%3A-Multi-...

visarga · on June 29, 2018

Haha, small world. I knew about QRNN and even tried to find an implementation once to test it on my data. Neat idea.

mark_l_watson · on June 30, 2018

Thanks! I am trying TCNs this weekend.

Raf_ · on June 28, 2018

Glad you like the article and thanks for suggesting researching the CPU usage across these frameworks - it's something worth looking into.

t27 · on June 28, 2018

Google webcache link if the main page doesnt load (archive.org had some rendering issues)

https://webcache.googleusercontent.com/search?q=cache:https:...

Raf_ · on June 28, 2018

Thanks, appreciate it!

cttet · on June 28, 2018

I just like Tensorflow better. For building new models, the graph is complex and errors are unavoidable. There is a separate compile time for Tensorflow and errors will be found before the data come in. Tried pytorch before, the error messages are usually not helpful at all and often leads to clueless debugging for hours.

For trying out deep learning, or build on existing models, pytorch or keras may be easier to grasp. But when making new models that involves a lot of math, the Theano/Tensorflow is more helpful IMO.

stared · on June 28, 2018

It is an interesting perspective, but my experience is exactly the opposite. In Theano debugging was awful. TF felt like a breeze until it didn't. When I jumped on PyTorch - it TF started feeling confusing by comparison. Errors exactly in the defective lines, possibility to print everywhere (or using any other kind of feedback / logging intermediate results).

For using models it may note matter that much (though, again read YOLO in TF and PyTorch and then decide which is cleaner :)).

For new models which go beyond a standard ConvNet/LSTM... well, PyTorch is heaven, Theano sounds like a torture.

cttet · on June 28, 2018

I am not sure what is your programming style in Pytorch. As people recommended and in most tutorials I see the sequential approach, where a small mistake of data preprocessing would lead to clueless errors in a completely irrelevant line.

YOLO is a quite standard feed-forward model in my opinion. I mean the math part, which I am more concerned with.

I have never used Theano before, my idea from it is that Tensorflow followed its static graph approach.

stared · on June 28, 2018

Did you read cost function of YOLO in both languages?

v4r · on June 28, 2018

If a model involves a lot of math, is it more helpful to be able to debug it? What tf lacks is an intuitive debugging tool. I think this is where pytorch excels.

cttet · on June 28, 2018

You don't often need to debug, especially if the model can be checked on compile time. Think static types for programming. For Tensorflow all the data types and tensor dimensions are checked before loading any data, it the math is derived correctly then it is not necessary to even debug.

If data and model are mixed, it often resort to line-by-line debugging to zone out the real problem, which often takes more time.

matsemann · on June 28, 2018

I don't do much ML (this kind of ML at least), so I know I'm not the target audience for these libraries. But I'd wish they were written in some language with static typing for IDE help. The API interface/tweaks-to-be-done for some of them is enormous, and mostly undiscoverable.

I mean, just looking at the "getting started, 30 seconds to Keras"[0], there are so many magic strings and options. Of course, if one is well-versed in this domain, they make sense. But it's hard to grasp, and Keras is supposed to be the high-level one.

[0]: https://keras.io/

eafkuor · on June 28, 2018

I've been recently following the course over at fast.ai and I'm having the same issue. The API is, as you mentioned, undiscoverable, and I don't really want to look at the source code, which is kind of unreadable anyway ([0]). The course itself is littered with poorly readable code ([1]) such as:

to_np(m.ib(V(topMovieIdx)))

Why, just why.

Despite all this, I wholeheartedly recommend this course, it demystified DL for me.

[0]: https://github.com/fastai/fastai/blob/master/fastai/model.py

[1]: https://github.com/fastai/fastai/blob/master/courses/dl1/les...

kriro · on June 28, 2018

I kind of agree. I find the way the fastai library (which builds on PyTorch) is written does not match my personal preferences very well. That being said I still really enjoy working with it and the courses do guide one along quite well. I'd actually recommend it for anyone who just wants to get started and play around. It's fairly easy to get good results on a gaming box (say 1080 GTX) reasonably quickly. I also like the Jupyter Notebook approach that they use for the lectures. It encourages experimenting around and they also encourage you to dive into the library and read source code which is good. Alas I find the terse style complicates it a bit but that may very well be personal preference. It's also good to know that they try to implement interesting papers quickly.

I think overall if your goal is to use a preexisting architecture to get quick results fastai is a great point to start. If you want to build your own architecture, reach one level of abstraction lower. Edit: I liked this PyTorch youtube series quite a bit: https://www.youtube.com/watch?list=PLlMkM4tgfjnJ3I-dbhO9JTw7...

supermdguy · on June 28, 2018

The terseness is intentional[0], following the idea that "brevity facilitates reasoning". I don't 100% agree with it, but at least there's reasoning behind it instead of just laziness.

[0]: https://github.com/fastai/fastai/blob/master/docs/style.md

dekhn · on June 28, 2018

I've never seen this code before: to_np(m.ib(V(topMovieIdx)))

but I can guess that topMovieIDx is the index of the top movie, V converts to a vector(?), dunno about m.ib, and to_np is converted to a numpy array.

Shoue · on June 28, 2018

I fully agree with you. While I can't speak for Keras, after reading [1] it seems Tensorflow would very much benefit from strong types. You'd easily be able to catch most of the errors presented in the article with higher-kinded and linear types, and maybe you don't even need that much power.

[1]: https://jacobbuckman.com/post/tensorflow-the-confusing-parts...

hak8or · on June 28, 2018

I totally agree, the lack of documentation via types and lack of smart autocomplete (which I rely on very heavily for API discovery) is not only why I never got into tensor flow, it's also why I never got into languages like python. I even went so far as to use typescript instead of Javascript.

I do believe c# has some machine learning libraries, but afaik they aren't anywhere near the level of tensor flow or keras.

jaegerpicker · on June 28, 2018

Scala and Java both have access to really great ML libraries, MLLib in particular is supposed to be really good. I've used it a little but I'm not really a ML expert to judge.

gaius · on June 28, 2018

But I'd wish they were written in some language with static typing

CNTK has a C++ API but the documentation is unfortunately just "read the header file" https://docs.microsoft.com/en-us/cognitive-toolkit/cntk-libr...

Also Python obv, or use it as a backend to Keras (in R).

mindcrime · on June 28, 2018

Try DeepLearning4J, it's based on Java, and is pretty kick-ass.

phonebucket · on June 28, 2018

I don't believe I would ever discourage anyone from using any particular framework. The skills learnt from one are highly transferable, so it doesn't matter too much which framework you start with.

Also, with eager execution, Tensorflow has become much more accessible to new users.

Having said that, the world would likely be a better place if everyone just used PyTorch :)

Raf_ · on June 28, 2018

These are great points, thanks.

dekhn · on June 28, 2018

I don't know about how many people external to Google know about tf.estimator, but it's where most people who aren't building complicated custom architectures should be starting. Keras is nice, it's easy to use, but I wouldn't use it to design build and run a massive productive pipeline. tf.estimator is just that.

kajecounterhack · on June 28, 2018

+1 for the estimators API.

stared · on June 28, 2018

BTW (from author - of blog post, and this library): for super-simple live training plots in Jupyter Notebook for Keras (and PyTorch): https://github.com/stared/livelossplot

For more advanced training for business or Kaggle competitions (version controlling of code and results, advanced charts): https://neptune.ml/

mlthoughts2018 · on June 28, 2018

Having used both plain TensorFlow and Keras for some very large image processing production services, Keras wins easily, and interoperates with sprinkling in low-level TensorFlow very well.

Even defining a custom deep CNN for multiple image prediction tasks (so, deep and custom architecture), Keras holds up well — and creating your own layers in Keras is very easy.

_v7gu · on June 28, 2018

Having used Torch (the Lua library) before, the comparison between the Sequential models seems very absurd. Even the pyTorch documentation gives an almost equivalent model defintion method:

    # Example of using Sequential
    model = nn.Sequential(
      nn.Conv2d(1,20,5),
      nn.ReLU(),
      nn.Conv2d(20,64,5),
      nn.ReLU()
    )

nabla9 · on June 28, 2018

These sequential models are are like Fibonacci function comparisons between programming languages.

They are simple and basic, difference between 5 lines of code or 20 lines of code makes no difference. You spend very little time actually coding these layers. Understanding the model, default parameters used underneath is more important.

It would be nice to see some examples with skip-layers, weight sharing etc. You you have to drop sequential model to do them or not?

gjmveloso · on June 28, 2018

Why not mxnet[1]? Or even better, Gluon[2]?

[1]: http://mxnet.incubator.apache.org

[2]: https://gluon.mxnet.io

stared · on June 28, 2018

(Another author here). It is explicitly explained in the text. :)

tl;dr: not nearly as popular (which means: less tutorials, less documentation, less examples, less integration with other systems, less community support for development or discussions)

Sure, all frameworks do have some goal and once one is confident in DL, may be a good choice. As you see from the plots there - MXNet is very fast for some applications.

novaRom · on June 28, 2018

mxnet is not just fast for "some" applications. It consistently outperforms many frameworks especially in realm of compute intensive convolutions.

stared · on June 28, 2018

See charts. Likely, but still 30% speed boost is not a factor for someone learning DL (then debugging, or training wrong models, can easily give overhead of 5-20x).

orlp · on June 28, 2018

I can definitely second Keras - it was the third NN library I tried and the one I actually got working the fastest producing results.

iaml · on June 28, 2018

FWIW keras is integrated into tensorflow as tf.keras [0], or at least, it should be - never tried it myself.

[0] https://www.tensorflow.org/api_docs/python/tf/keras

ChankeyPathak · on June 28, 2018

What about tf.keras? https://www.tensorflow.org/versions/r1.9/programmers_guide/k...

nbeleski · on June 28, 2018

Interesting article. I've been doing some production work with ML and I was wondering which tool would work better in my specific enviroment.

Currently I've been training a CNN model in Keras with good success, and using custom scripts to port it to a TensorFlow model. The .h5 file from Keras helps a lot with this step.

Next step is compiling a shared Tensorflow library so I can deploy the trained model in C++ (project requirement) and this has been a pain in the ass, regardless of framework...

sivakon · on June 28, 2018

Pytorch has the best API to understand deep learning and Pytorch based Pyro is also very good for probabilistic programming (fresh take coming from Stan/PyMC3)

minimaxir · on June 28, 2018

Speed is one thing, but the key value proposition of Keras for me that rarely comes up in these comparisons are Keras’s native utility functions, including easy and correct text tokenization/padding, easy OHE of categorical variables without using sklearn, and easy model saving/loading from an .hdf5 file. (Although I am not an expert on PyTorch and not as familiar with the ETL pipeline for that)

rjdagost · on June 29, 2018

I agree strongly with the ease of model saving / reloading in Keras. I found this basic functionality to be exasperatingly difficult and cumbersome in TensorFlow.

novaRom · on June 28, 2018

I compared many frameworks and find mxnet is the best choice for production if you care about development time and speed of the training.

leecarraher · on June 28, 2018

For nn's in my experience out of memory, and preprocessing tends to cause an equal number issues as the nn optimization. Which tfrecords and streaming seem to solve. Are there similar object loading facilities in pytorch? Though I have not specified models in keras, since it is now part of tf i presume the formats are compatible.

m3kw9 · on June 28, 2018

I was able to build a deep learning OCR using CNN from scratch using Keras and runnning in an App using iOS’s coreml in 2 months without prior experience. Hard part was actually getting the data set great. 80% of the time was data massaging. Keras saved me some time. Although the results wouldn’t be world class.

d_burfoot · on June 28, 2018

Can we please please please not have the kind of framework overproliferation and fragmentation in the Deep ML world that they have in the front-end web world? It's hard enough to learn ML concepts without also having to learn a new ML framework every year.

mkirklions · on June 28, 2018

I was going to disagree, but I am using laravel php (technically backend, but Ive developed a front end for testing)

and.... I'm also using React Native because I dont like Apple and hopefully I can use a friends computer the moment I compile for iphone.

aaronsnoswell · on June 28, 2018

Completely agree with the OP. Had the experience of learning TF first, then switching to PyTorch. Cannot recommend enough.

opwieurposiu · on June 29, 2018

Anyone get pytorch to work on windows? I could not figure it out. Keras and tensorflow work great on windows.

Raf_ · on June 29, 2018

Which version were you installing? Version 0.4 released in April added Windows support.

_csoz · on June 28, 2018

While we are at it, which framework is the easiest to tweak at the low-level, e.g. create modified LSTMs etc ?

laingc · on June 28, 2018

Pytorch, by a mile.

If I had to summarise the frameworks in a few words, they would be: Keras for speed, Tensorflow for production, Pytorch for research.

mlthoughts2018 · on June 28, 2018

I would say keras or mxnet for speed and production. PyTorch for research. By this point there are hardly any cases when it’s worth it to descend to lower TensorFlow levels.

pdyck · on June 28, 2018

Keras only offers standard layers but you can implement your own LSTM layer and use it with Keras. This way you can take advantage of all the other features

k__ · on June 28, 2018

Is TensorFlow.js a viable alternative?

I don't know much about Python :/

stared · on June 28, 2018

Learning Python is way faster than learning Deep Learning, so it shouldn't be an issue.

I am planning organizing a TensorFlow.js bootcamp, but here it is more difficult (as data preprocessing, and debugging in general, is way more difficult in JS than in Python).

k__ · on June 28, 2018

True.

I just had the hope, that integrating this into existing JS code-bases would be easier with TensorFlow.js :)

gaius · on June 28, 2018

As Keras has pluggable back-ends it's the obvious choice (I am using CNTK with mine)

v_lisivka · on June 28, 2018

Start with Yolo: small C program with minimum of dependencies.