Multi-Armed Bandits, Conjugate Models and Bayesian Reinforcement Learning

heydenberk · on Aug 31, 2018

I spent two years[0] designing, building and maintaining a system which used contextual multi-armed bandits at large scale. A couple pieces of advice relating to this post and this subject:

1. Thompson sampling is great. It's intuitive and computationally tractable. The literature is full of other strategies, specifically semi-uniform strategies, but I strongly recommend using Thompson sampling if it works for your problem.

2. This is broadly true about ML, but for contextual bandits, most of the engineering work will probably be the feature engineering, not algorithm implementation. Plan accordingly. Choosing the right inputs in the first place makes a big difference. The hashing trick (a la sklearn's dictvectorizer) can make a huge difference.

3. It can be difficult to obtain organizational alignment on the intention of using reinforcement learning. Tell stakeholders early and often that you're using bandit algos to produce some kind of outcome — say, clicks or conversions — and not to do science which will uncover deep truths.

[0] along with an excellent data scientist and a team of excellent engineers, of course :)

antidesitter · on Sept 1, 2018

> Thompson sampling is great. It's intuitive and computationally tractable. The literature is full of other strategies, specifically semi-uniform strategies, but I strongly recommend using Thompson sampling if it works for your problem.

Spot on. For more information about the advantages of Thompson sampling over other approaches, see Why is Posterior Sampling Better than Optimism for Reinforcement Learning? [1] by Osband and Van Roy.

[1] http://proceedings.mlr.press/v70/osband17a/osband17a.pdf

ma2rten · on Sept 1, 2018

The hashing trick (a la sklearn's dictvectorizer) can make a huge difference.

I'm not a huge fan of that. Hash collisions can lead to unexpected behaviors in production and make feature attribution for debugging harder.

It's slightly more effort to implement, but with a trie data structure you can store even the biggest feature mapping in memory.

achompas · on Sept 1, 2018

Does the fact that sklearn reverses signs for features that collide change your opinion?

lsorber · on Sept 1, 2018

How are you compressing the feature space in that case, by truncating the trie?

ma2rten · on Sept 1, 2018

You could use clustering, dimensionality reduce or feature selection.

However, the way I have seen the hashing trick being used is not to compress the feature space. For most problems it would be a bad idea to just lump your most discriminative features together with some other random ones. Instead people just choose a very large feature space which makes collisions unlikely. For model implementations using sparse matrices it doesn't matter if the feature space is very large. The main advantage of this is that you don't have to keep an expensive hash map of your vocabulary in memory (hence my suggestion to use a trie).

_eigenfoo · on Aug 31, 2018

Incredible! Thanks for sharing the advice :)

heydenberk · on Aug 31, 2018

Thank you for the post!

gwulf · on Aug 31, 2018

+1 for thompson sampling

atrudeau · on Aug 31, 2018

Though mentioned it the article, I'll add it here for posterity: https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf

Great tutorial on Thompson sampling.

mopierotti · on Aug 31, 2018

I would strongly recommend the post he cited. It is the same style but features interactive visualizations: https://dataorigami.net/blogs/napkin-folding/79031811-multi-...

I implemented something like this for my company and found the latter article quite helpful in explaining the concept to people who understood the basics of probability but not programming.

pengstrom · on Aug 31, 2018

I want to recommend the book http://camdavidsonpilon.github.io/Probabilistic-Programming-... a nice little book about probabilistic programming in Python.

melling · on Aug 31, 2018

There was a free Bandits algorithm book discussed on HN about a month ago.

https://news.ycombinator.com/item?id=17642564

_eigenfoo · on Aug 31, 2018

Yes! I haven't read all of it yet, but from what I've seen so far, the book spends a lot of time rigorously proving mathematical properties/bounds of various bandit algorithms. I love rigor as much as the next guy, but I also like seeing code :)

atrudeau · on Aug 31, 2018

The math isn't rendering for me...

_eigenfoo · on Aug 31, 2018

Right! Sorry about that, forgot to add mathjax. Should be fixed soon...

atrudeau · on Aug 31, 2018

I was about to send you an email! :)

tegansnyder · on Aug 31, 2018

This is an excellent well written post. Thanks for sharing.