Will Keras2 support PyTorch as backend, in the future? Answer: [0] No, there are...

_ntka · on March 15, 2017

To put this quote in context: this isn't specifically about PyTorch. Every couple of months since mid-2015, a new deep learning framework gets released. In the following week, someone inevitably asks "will X get added as a Keras backend?".

Supporting several backends is a strong positive. But chasing every new framework as a backend is a quick way to kill Keras, via bloat, support issues and general technical debt. We should only support a backend that is considered mature, and we should stay away from the hype surrounding the release of every new framework. There will be another hyped up framework next quarter anyway. And the one after.

It is in fact possible that Keras will eventually support PyTorch. But if it ever happens, it would be at least 1-2 years in the future. When PyTorch becomes "uncool", just like Keras :)

jph00 · on March 15, 2017

Yay, making deep learning uncool again! ;) (http://www.fast.ai/about/)

But seriously - does it even make use to have a "define by run" dynamic framework as a backend? It seems to me that keras is particularly suited to wrapping frameworks that define and run a computation graph.

_ntka · on March 15, 2017

With the functional API of Keras, it would definitely make sense. In fact I do think that imperative model definition would be great to have at some point in the future. We'll see :)

jph00 · on March 15, 2017

I'm intrigued!... The kernel calling overhead and lack of any GPU while/scan/map/etc for Pytorch seems like a limitation, but I guess on 2nd thoughts you can still do all the keras fit/predict stuff and auto-connecting up the layers.

apaszke · on March 15, 2017

These ops are just not needed in PyTorch. while is just a Python while loop. Scan is a for loop, map is a list comprehension that applies modules. No need for anything fancy.

jph00 · on March 15, 2017

Sure - but on pytorch they suffer the kernel launch overhead each time through the loop, whereas on tensorflow and theano they do not. Which really impacts the kinds of algorithms that work well on each platform. Does that seem like a reasonable assessment to you?

smhx · on March 15, 2017

Currently not many frameworks have actual fusion of kernels (to avoid launching many GPU kernels). If you look underneath a theano.scan or TF.scan, GPU kernels are still being launched individually (but are likely stream-overlapped where appropriate).

With TF's XLA compiler, they are slowly getting towards kernel fusion, which will then reduce launch overheads.

We have similar things in the works for pytorch: to quickly JIT at runtime the dynamic graph that is getting executed. More news on this will come when time-appropriate.

whyrt12 · on March 15, 2017

I WANT to use pytorch, but no bayesian learning or stochastic nodes like in edward. Any chance there are plans to for a compatibility layer with Edward or roll your own bayesian stuff?

Also, have you looked at Numba to do the jitting? Probably best not to have yet another separately maintained python JIT.

smhx · on March 15, 2017

as core-devs, we dont plan to build-in something like Edward. However, folks in the community are brewing something:

https://discuss.pytorch.org/t/bayesian-computation-in-pytorc... https://discuss.pytorch.org/t/distribution-implementations/4...

apaszke · on March 15, 2017

To not have the kernel launch overhead you'd need to stop launching GPU kernels but that's now how things work in any framework ;)

evernflow · on March 15, 2017

This is surprising since keras basically started as a rip off of the Torch API in Python

HelloNurse · on March 15, 2017

Layering Keras on top of another framework, such as Theano, is useful because it gains compatibility with code using that other framework. If Keras and PyTorch are both similar (in spirit and API) to Torch, integrating PyTorch-based code as is into Keras project would be very low-value compared to a presumably easy translation to Keras.