Copying my rare product endorsement from the previous submission:
Keras is so good that it is effectively cheating in machine learning, where even Tensorflow tutorials can be replaced with a single line of code. (which is important for iteration; Keras layers are effectively Lego blocks). A simple read of the Keras examples (https://github.com/fchollet/keras/tree/master/examples) and documentation (https://keras.io/getting-started/functional-api-guide/) will let you reverse-engineer most the revolutionary Deep Learning clickbait thought pieces.
It's good to see that backward compatability is a priority in 2.0, since it sounds like a lot had changed.
Trying to do something simple in TF is a pain, on the code there are some conflicting examples and code snippets that "train" a network just to print a loss number on the screen but actually do nothing besides that
Keras is easy to use and better if you're running CPU only
A couple notes on backwards compatibility I ran into yesterday when upgrading auto_ml to use the latest:
Keras now throws errors when trying to use some of the metrics they've deprecated. If you run into `UnboundLocalError: local variable 'class_name' referenced before assignment`, check that all your metrics are supported.
Keras also now ignores `nb_epoch` in favor of `epochs`. I must have misread the blog post, because I thought it would support `nb_epoch` for a bit and just use that in place of `epochs`.
> Keras is so good that it is effectively cheating in machine learning, where even Tensorflow tutorials can be replaced with a single line of code.
That's the way it "felt" to me when it was introduced in one of the lessons (then later used in a project) in the first term of the Udacity Self-Driving Car Engineer nanodegree I am enrolled in (my cohort just finished the first term; second term for us starts on the 23rd).
We first played around with a simple NN library built from scratch in Python named "Miniflow"; basically, it was a relatively simple backprop library based around Numpy (to eliminate having to implement the vector math part of things). It gave a good overview of how a neural network is developed and implemented at a lower level, and how it actually worked. I say that with having a similar level of knowledge from taking the CS373 course at Udacity (2012) as well as the original (2011) "ML Class" from Andrew Ng - where in both a simple NN was developed using Python and Octave (respectively).
That gave us students the foundation; Tensorflow and how to use it with Python was then introduced (I also took the time and initiative to get my system prepped to use my 750 Ti GPU with TF). That was a revelation - TF made implementing a NN seemingly dead-simple by comparison. It felt almost like plug-n-play!
Then we learned about Keras: I thought that TF made things simple, but Keras proved that wrong. Your comment about it being "Lego blocks" is spot on. It was really simple to implement the NVidia End-to-End CNN to control a virtual car on a track, once it was given proper training from "camera" views.
All that said, though - without having that lower-level foundation of "Miniflow" - you can't appreciate exactly what it is that Keras gives you, nor can you easily grok what is actually happening under the hood, so to speak. I know that our simplified NN library only scratches the surface of what Tensorflow provides, but it does give a foundation on which to experiment and understand things further, IMHO.
Which is why just jumping into Keras without going back to "the roots" of neural networks and machine learning may be doing a disservice to self-learners on this topic. We are still at the point in the process where having the fundamental understandings can help to inform the implementer of solutions in Keras. Its kinda like knowing how to program in Java without understanding what a stack or a linked-list is, or how they work. While it is certainly possible to do so (and produce properly working code), for certain problems having that understanding may be a necessary requirement.
Even when it isn't, though, it still may be a worthwhile thing to know in the end. That's just my opinion, though.
Will Keras2 support PyTorch as backend, in the future?
Answer: [0]
No, there are no plans to support PyTorch. There is nothing to be gained in supporting every novelty framework that crops up every quarter. Our goal is to make deep learning accessible and useful to as many people as possible, and that goal is completely opposite to building up deep learning hipster cred.
To put this quote in context: this isn't specifically about PyTorch. Every couple of months since mid-2015, a new deep learning framework gets released. In the following week, someone inevitably asks "will X get added as a Keras backend?".
Supporting several backends is a strong positive. But chasing every new framework as a backend is a quick way to kill Keras, via bloat, support issues and general technical debt. We should only support a backend that is considered mature, and we should stay away from the hype surrounding the release of every new framework. There will be another hyped up framework next quarter anyway. And the one after.
It is in fact possible that Keras will eventually support PyTorch. But if it ever happens, it would be at least 1-2 years in the future. When PyTorch becomes "uncool", just like Keras :)
But seriously - does it even make use to have a "define by run" dynamic framework as a backend? It seems to me that keras is particularly suited to wrapping frameworks that define and run a computation graph.
With the functional API of Keras, it would definitely make sense. In fact I do think that imperative model definition would be great to have at some point in the future. We'll see :)
I'm intrigued!... The kernel calling overhead and lack of any GPU while/scan/map/etc for Pytorch seems like a limitation, but I guess on 2nd thoughts you can still do all the keras fit/predict stuff and auto-connecting up the layers.
These ops are just not needed in PyTorch. while is just a Python while loop. Scan is a for loop, map is a list comprehension that applies modules. No need for anything fancy.
Sure - but on pytorch they suffer the kernel launch overhead each time through the loop, whereas on tensorflow and theano they do not. Which really impacts the kinds of algorithms that work well on each platform. Does that seem like a reasonable assessment to you?
Currently not many frameworks have actual fusion of kernels (to avoid launching many GPU kernels). If you look underneath a theano.scan or TF.scan, GPU kernels are still being launched individually (but are likely stream-overlapped where appropriate).
With TF's XLA compiler, they are slowly getting towards kernel fusion, which will then reduce launch overheads.
We have similar things in the works for pytorch: to quickly JIT at runtime the dynamic graph that is getting executed. More news on this will come when time-appropriate.
I WANT to use pytorch, but no bayesian learning or stochastic nodes like in edward. Any chance there are plans to for a compatibility layer with Edward or roll your own bayesian stuff?
Also, have you looked at Numba to do the jitting? Probably best not to have yet another separately maintained python JIT.
Layering Keras on top of another framework, such as Theano, is useful because it gains compatibility with code using that other framework.
If Keras and PyTorch are both similar (in spirit and API) to Torch, integrating PyTorch-based code as is into Keras project would be very low-value compared to a presumably easy translation to Keras.
I'm definitely going to give this a shot, thanks for the link. Approaching ML at a higher level is exactly what I need to develop a better interest in it. I realize that underpinnings are important, but waiting 30 minutes for mnist on to process on my localhost is just unbearably boring.
If you try the course, be sure to make use of the forums for it too: http://forums.fast.ai . As you'll see, they're extremely active and helpful for all deep learning students (and all practitioners in general).
Disclaimer: I teach the course. Although it is free and ad-free... :)
Regarding AWS, one participant has created a system that lets you use spot instances for the course. It's published on the forum. Great way to save $$$ (400% or more...)
Keras is fantastic. Not the tightest analogy and probably unoriginal but I think of it as the Python to Tensorflow's C. It's easy to drop into tensorflow flow when needed but you can probably get away with Keras for a long time. Also, Francois helped us when we DM'd him on Twitter which was incredible.
Thank you so much Francois! I'm incredibly excited about this release!
I'm only starting with all that machine-learning, NN stuff and as many others I want to ask for some guidance/resources/learning material. What I feel especially lacking is something very broad and generic, some overview of existing techniques (but not as naïve as Ng's ML course, I assume). There exist a lot of estimators and classifiers, there exist a lot of techniques and tricks to train models, there exist a lot of details on how to design a NN architecture. So how, for instance, do I even decide, that Random Forest is not enough for this task and I want to build some specific kind of neural net? Or maybe I don't actually need any of these fancy famous techniques, but rather there exist some very well defined statistical method to do what I want?
What should I read to start grokking this kind of things? I feel quite ready to go full "DIY math PhD" mode and consume some heavy reading if necessary, but where do I even start?
So how, for instance, do I even decide, that Random Forest is not enough for this task and I want to build some specific kind of neural net?
The problem here is that it's really hard to give generic advice. As an analogy this is like asking "how do I know if Rails is enough for this task".
The answer is usually "yes", but the specifics matter a lot.
So in this specific case (and I realize you aren't looking for specific advice here, but I think the principles are useful):
Random Forests are very powerful, and work really well for hundreds, maybe thousands of features, on large but not huge amounts of data and are fairly easy to train.
There are a large number of types of neural networks. One of the big advantages of deep neural networks is that that can reduce the need for manual feature engineering. For examples conventional neural networks extract features from images that work better than any human engineered features, and LSTMs (and variations) work well at extracting features from text. The problem with deep neural networks is that they (generally) need a lot of data to train.
So, as usual the answer is "it depends".
In industry though, 90% of the time the question isn't "what classifier should I use". It's "how do I get the data"/"how do I extract features" and then "lets try all the classifiers and see what works best".
"Try 'em all" is not just an answer, but the only answer.
The No Free Lunch Theorem says that averaged across all possible problems, no single classifier is the best; in fact, they're all equivalent.
However, you probably don't care about all possible problems, but a specific one. Over the last decade or so, we've discovered that deep learning works really well on certain classes of problems, particularly those that may have some kind of nested structure, as in object or speech recognition. If your problem resembles one of those, a deep neural network might be a good place to start.
To paraphrase the learnings of thousands of data scientists on years of Kaggle competitions:
A quick and dirty model for a baseline: Random Forest
Structured data: Use a boosted tree algorithm (specifically the XGBoost implementation of gradient boosting), ensembled with maybe Extra Trees, Random Forests and MLPs
Some kind of time component on large datasets: FTL regression, XGB
Binary data (images or sound): Deep neural nets
Text: Try LSTMs, but this will often be beaten by manual feature engineering and Word2Vec derived features put into XGB.
LightGBM (https://github.com/Microsoft/LightGBM) is shaping up to beat XGBoost; it has mostly API parity and it won in benchmarks before a v2 with a new algorithm.
I tried LightGBM for a Kaggle. I couldn't get anywhere near XGB.
I was using the LambdaRank stuff. Given the boasting the LightGBM team had done I had assumed it would be close to XGB out-of-the-box for a ranking problem (since XGB only does pairwise ranking). It was far enough away that I had to ask if I was misinterpreting the output[1].
That was 6 months ago now, so maybe it has improved. I know they made big claims.
I don't think anyone has successfully used it for a high result in a Kaggle yet, which - for all its faults - is a good way to see what the maximum performance of a software package seems to be.
LibFFM is the other thing I should have mentioned previously as being worth trying.
Another less-recognised point is that in industry, you also need to ask "how can I maintain this?" and "what can go wrong with my algorithm?".
In one use case, a "blip" in your algorithm might mean showing the wrong kind of advertisement to a user. Not great, but ultimately no big deal. In another, it might mean automatically buying billions of dollars' worth of pumpkin futures (cf. Knight capital).
In the latter case you need a much greater penalty on model complexity, and much more emphasis on interpretability.
While I agree with your point (and often use this in interview questions) that wasn't what caused the Knight Capital problem.
That was bad software engineering and deployment practices, and had nothing to do with interprability of the model (actually it had little to do with the model at all.) They repurposed a feature toggle, then misdeployed the code: http://pythonsweetness.tumblr.com/post/64740079543/how-to-lo...
I understand that this was an example, but I'm sure someone will misread it as what happened in that case.
Generally for structured data (i.e. each column represents a distinct type of information, such as 'revenue' or 'color') you'll want random forest or GBM.
For unstructured data, where you'll need lots of complex feature engineering, you'll generally want to let the model learning those features - so use deep learning. E.g. images, natural language, audio...
I've won competitions with random forests and teach deep learning - both definitely have their place, but they are generally for quite different types of data. (This may change in the future, however, with deep learning showing that it has the potential to work well for structured data too.)
(Don't worry about the No Free Lunch theorem - it has little to do with predictive modeling in the real world. Recent research shows that a random forest will give amongst the best results for the vast majority of real world datasets.)
Thanks, I'll try that as well. But then again, this is specifically about deep learning. I'm asking more about something generic, systematic overview that would help me to know that I'm using some specific techinque because of reasons, and not because "deep learning is cool". Something that would include very basic, "manual" statistics approach as well as intro to NNs. I mean, I probably know that I need CNN when I'm presented with a picture, and sometimes I might guess that I might want to use RNN if I'm presented with a text I don't know how to parse, but when I want to predict something given a bunch of numbers and stuff, it is not all that obvious which exactly approach is likely to be "the right one" and which one is probably "because fashion".
Even though you specifically say you are willing to go full blown PhD and are interested in digging deep on algorithms etc. I strongly recommend working through "Practical Deep Learning for Coders" course at fast.ai
It's free :)
It gives you an excellent feel for what is possible and they are very focused on solving interesting and practical problems right away. They explicitly try to take the "requires a math PhD" out of deep learning. Once you're through with the course you have a very solid practical overview and understanding and can solve tons of real world problems (it's almost a startup idea generator tbh.) and once you're at that stage it becomes tons easier to dive deep into specific algorithms and optimizations.
tl;dr: Take the course (they also walk you through setting up a AWS GPU server so no fancy hardware required) and you'll be able to solve real world problems with state of the art algorithms.
I'd definitely watch the first few episodes of Ng's stuff, up to and including logistic regression (unless you know all of that already, in which case: read papers and do practice projects for yourself--or compete in kaggle if you don't have any application ideas)
The most common way to apply machine learning is supervised classification. The basic formula is: we learn a model (set of weights) to approximately map data (a matrix X) to corresponding labels (a matrix Y). Where you can use logistic regression to learn a set of weights, you can use a keras-based neural network.
If all of that makes sense to you already, I think you're well prepared to read Keras' documentation.
It surely does make sense to me, but I seriously think (maybe hope, even?) that "hacking-driven" approach here is significantly overvalued. Because of sociological reasons. After all, all this is mathematical problems, and while I'm aware that NNs are pretty much unexplored space, there surely must exist some quite significant amount of knowledge at level below the NNs that can be actually systematically learned. All these various statistical methods R-lang community is buzzing about which I'm not ever aware of, some rationale about "why NN and not just a regression", etc. You know, the math.
If you just pick up a math book, you'll learn lots of stuff that you don't need to know. That's fine, but it strikes me as a good way to avoid actually doing anything and gaining practical experience.
If you hit a wall in practice because you don't understand the math, you'll usually have enough of an idea of the problem to ask more intelligent questions about what kind of math you need. That will, incidentally, help you understand the math better because you're coming to it out of an actual need rather than just seeing it mixed into a bunch of chapters.
Unless you're going to write a machine learning framework or be a researcher, the required math isn't too bad and it sounds like you might have enough of a background already. So don't be afraid to dive into something practical (like a kaggle competition).
FWIW this is a really good blog for insight into the math and intuition behind deep learning: http://colah.github.io/ (i'm not sure if it's quite what you're looking for though)
The mathematician in me has kept me from jumping into deep learning before I understand the mathematical and statistical underpinnings of the algorithms involved. Looking forward to reading through the latest book out by mit press and giving things a whirl with Keras which I've heard so much about.
The only "math" in deep learning is given by reverse mode AD (or if you're into fancy stuff, "efficient computing of pullbacks"). The rest of it plain old hacking, and empirical tricks with the occasional variational doodads.
You clearly also haven't read many of the papers it cites.
I would say one weakness of the book is that parts of it are too much like a survey of the papers in a subfield. Another is that it is very heavy on theory and light on practice (e.g., no exercises.)
Pray tell me, oh self-conceited one, what I missed that is both in actual use and in that book ? For things outside this set, you'd not read this book anyway; nor would such things be called "deep learning" (other than may be RBMs).
I love Keras but I think this update broke more things than you realized. For example it's no longer possible to get the validation set score (val_acc) during training which renders early stopping impossible. This was a documented feature on your FAQ.
Is the old documentation still available? I'd like to wait before I upgrade.
You can try opening an issue on Github. `val_acc` is definitely still accessible by callbacks, and the `EarlyStopping` callback, which relies on it, is fully unit-tested.
1. Still no support for multiple losses. Models like VAEs cannot be idiomatically implemented. The second loss has to be 'hacked' in. Notice how in the official example for VAE, the kl_loss is computed using variables which are NOT available via the loss function (https://github.com/fchollet/keras/blob/master/examples/varia...)
2. It's still an input->output paradigm, rather than a {input, output}->loss paradigm which gives more flexibility.
These two issues are the main reason why I stick to slightly lower level APIs, even though I _want_ to use Keras.
- You can use a Keras model to compute some tensor(s), turn that into a loss, and manually add that loss to the model via `add_loss` (it just needs to only depend on the model's inputs).
- Not all of your model outputs have to have a loss associated with them. So you can do both {input, output}->loss and input->output in your workflow, as you wish. Effectively, losses and outputs are decoupled.
The VAE example hasn't yet been updated to use the `add_loss` feature, but it should be.
I will look into updating the VAE example, as I've ported the example to the keras 2.0 API recently. There is currently no documentation on add_loss as far as I can see, so I will have to try a few things.
Strongly seconded. There are a lot of things I am tinkering with where I would love to have more fine-tuned control over loss, and have to resort to various hackery to get a mediocre approximation of my real idea.
Copying a comment I made in another thread where one response recommended Keras:
I currently have a small pet project where I think some simple ML would be cool but I don't know where to start.
Basically my use case is that I have a bunch of 64x64 images (16 colors) which I manually label as "good", "neutral" or "bad". I want to input this dataset and train the network to categorize new 64x64 images of the same type.
But it's still too hard to understand exactly how I can create my own dataset and how to set it up efficiently (the example is using 32x32 but I also want to factor in that it's only 16 colors; will that give it some performance advantages?).
If you don't know how to set up a dataset, it's probably too early for you to worry about performance and efficiency.
If you haven't, already, I'd suggest to learn some general machine learning, including how to use logistic regression, random forests and SVMs.
Keras is certainly capable of what you want to do, at least from your description.
One way is to interpret the colors as grayscale images, that would be the fastest option. If however the 16 colors are actually from a palette, it may be better to convert the image to three channels, r/g/b. And if the 16 colors are 16 entirely different things, like 0 - Water, 1 - sand, 2 - earth and so on, you could even turn one 16 color image into 16 images with two colors (1 bit), and get a better model.
Again, getting into machine learning or deep learning is not as easy as reading the Keras documentation. You need to understand the basics first.
But you need to know the fundamentals on TF and NNs (RNNs,LSTMs, etc...). Keras makes it easier to build on those concepts with less programming. I've found TFLearn to be slightly complicated. Both Keras and tflearn make it simpler to deal with TF.
Creating a good train-test dataset is general ML problem. Keras doesn't solve that and isn't meant to do that.
However, Keras (and tflearn too) makes it easy to throw a statistically bad dataset to an NN, add multiple layers and then let TF take over and derive a inefficient model in a few hours. The amazing part is that the inefficient NN (driven by TF) might still return a slightly acceptable accuracy. This is awesome because you may be an amateur and yet have some okay results to start with. Later you can improvise the dataset to improve the accuracy.
In general, throwing NNs at everything isn't good. They result in hard-to-decompile blackbox models. If NNs give you good classification, you could also try the same with other classifiers. You could also start looking into scikit-learn algos and see if those could be used in your case.
Go with Keras. I believe this recent release was partially motivated by Google deciding to fold Keras into tensorflow. Therefore, I would expect keras to supersede tflearn in the areas where they overlap.
Keras is a great wrapper library built upon two fantastic frameworks -- theano and tensorflow. I'm glad to see it is moving forward, and kudos to everyone involved in all these libraries!
Slightly irrelevant but curious question about the Analytics for 7day (34K), 14day and 30day active users. I'm running a similar site so, could it be that, a lot of users reading documentation are using ad/tracking blockers so that active users count appear higher than it actually is in GA. Documentation users tend to read quite high pages per session. If I'm right then they should see less page views per user than expected.
For most websites I am familiar with, 20-30% of users use adblockers or other privacy plugins. On content with a target audience of developers I have sometimes seen 60%. Most of these users are not recorded in google analytics, so the real number of unique users is higher than reportet by analytics.
Since no data from these users is sent at all (no user data and no pageview data) page views per user is not directly influenced because your are missing users and pageviews in your reporting. It could be influenced because users with an adblocker behave differently then users without. To analyse this you would have to look at your server generated web logs.
The given data in the image is often not used to find out what the total amount of unique users on your webpage is. It is used for computing engagement: Monthly active users vs. daily active users. In this example we only have 7-day active users and no daily active users, but it basically is like: 34738/107942=0.32. At a value of 1 (the maximum) you have a high engagement. In simple terms: Each user would come back every week for this month. 0.32 is quite low. Around 0.25 would be the lower bound because we have 4 weeks in a month.
Awesome.
Yet "codebases written in Keras 2 next month should still run many years from now" given that deep learning is no new, how can they that confident that this API will remain relevant years down the line?
What does one thing have to do with the other? Regardless of how "relevant" Keras 2 stays years from now, code written with it now should still be capable of running then, that's the thing they claim.
Keras is so good that it is effectively cheating in machine learning, where even Tensorflow tutorials can be replaced with a single line of code. (which is important for iteration; Keras layers are effectively Lego blocks). A simple read of the Keras examples (https://github.com/fchollet/keras/tree/master/examples) and documentation (https://keras.io/getting-started/functional-api-guide/) will let you reverse-engineer most the revolutionary Deep Learning clickbait thought pieces.
It's good to see that backward compatability is a priority in 2.0, since it sounds like a lot had changed.