"This book has an unusual development design. The content is open-sourced, meaning anyone can be an author. Authors submit content or revisions using the GitHub interface."
What a wonderful thing -- this book looks fantastic, but the approach to making it really takes the cake. I really hope interactive notebooks (iPython based or otherwise) and multiple authors collaborating on Github will become widespread.
There is both wikibooks and wiki university - last time (several years ago) I looked on wikibooks there were actually very good physiology and biology textbooks that had been open sourced. Not sure if development has tapered but it seemed a pretty robust community for a while
It sounds like a perfect fit for the kind of fields that have grown so vast you need an expert for every sub-section. Like physiology and biology, basically.
I should like to give a contrarian comment about this, because it is on top of the front page and seems to be being received positively. This book is probably not a good way to learn about statistical inference. It has quite confused explanations of both Bayesian and frequentist approaches. The preface seems to imply that programmers, by virtue of being able to use computers, don't need to take a rigorous mathematical course in Bayesian methods. However the text actually uses mathematical notation throughout, and as far as I could tell it is often not explained. I noticed at least one case where a probability distribution (gamma) was described only through plots i.e. without specifying its pdf or how you could derive the pdf. I think the kind of discourse that this book exemplifies is halfway to cargo cult 'statistics'.
I've got exactly the same feeling. Could you suggest a good introductory textbook into MCMC? They left it as a mysterious blackbox and I'm not very uncomfortable with using mathematical blackboxes I don't understand.
They are covered near the end of the book. It should be enough to familarize yourself with and understand the basic concepts of MCMC. Anything more in-depth will require a strong mathematical background.
BTW : There are probably a ton of books that cover MCMC out there - that's just one I liked and which is freely downloadable.
I have some background (grad student in cfd, thinking about switching to some sort of data analysis later on) but my measure theory and probability skills are rusty (on the other hand numerical linear algebra, functional calculus and complex analysis are superb). What would be a good book for my level?
http://www.amazon.com/Data-Analysis-Bayesian-Tutorial-Public... is short and reputable on Bayesian statistics. On MCMC specifically, I don't know, but MCMC is really a kind of algorithm that lets you find the answer to a mathematical question (so I think understanding the math is the right thing to start with).
PS. There a second edition of that book, but I've heard that the first edition is better, because the second edition added a different author and expanded the book.
Well, no, because I don't know any good reasons for using Bayesian methods (except when prior probability distributions can be found objectively through some previous experiment etc).
how do you reconcile "I don't know any good reasons for using Bayesian methods" with the fact that Bayesian methods revolutionized spam-filtering? (or maybe you disagree they did?)
Naive Bayes revolutionzed spam filtering because it is incredibly easy to implement and understand, and was reasonably effective for early spam, not because it was the best model for detecting spam. There's a reason we started seeing ads for "v1agra" and snippets of prose -- it's pretty easy to game Naive Bayes.
On the other hand, the GP's assertion that there is seldom a need for using Bayesian methods is also unwarranted; they are the basis for so many machine learning algorithms in common use -- particle filters, for example.
That's a good question and I was asking myself that I after I wrote that comment. I think my objection is more to the 'Bayesian' and less to the 'methods', if that makes sense. That is, I think constructing and updating models using Bayes' theorem can be (as people doing spam-filtering have shown) a good way of making predictions, but that it is the frequentist properties of the models that actually matter (cf. the ubiquity of cross-validation: 'the proof of the pudding is in the eating'), not the fact that they let you maintain a probability distribution over parameters.
>without specifying its pdf or how you could derive the pdf
Would you or anyone happen to know of a good book that discusses the derivation of various advanced probability distributions? It is quite frustrating that every ML or stats book I come across run through various distributions without giving the reader any sort of motivation or intuition behind them. Without that intuition how am I supposed to have any idea when to apply one vs another?
I honestly can't recommend a book for this. The best resource I've found is MathWorld. I've picked up a bunch of very helpful intuitions from it, including:
- Cauchy: the horizontal distance from the origin at which an arrow shot at a random angle from a point below the origin hits the x-axis
- Gamma: how long you have to wait for the nth event in a
Poisson process
It's amazing how popular the term "Bayesian" is amongst people who don't really know what it means or quite where it fits in the context of other statistical paradigms.
I had a heck of a time getting scipy working in a virtualenv on Mac OS X Mountain Lion. If you're looking for an easy install script, I whipped one up here:
I read this book a few weeks ago. I loved the ipython notebook format. Being able to edit the code for each figure and play around with parameters while reading was a treat. As a bonus, I learned as much a about making nice figures as I did about Bayesian Methods. There were quite a few typos, but it wasn't much of a context switch to edit the source as I found them and then submit patches when I was done. I probably wouldn't have taken the time if it wasn't as convenient.
I love IPython: it's definitely worth it if you want to regularly use a REPL to explore things like your Django objects, filter for things, or test scripts with test data. I really like it for exploring a codebase which I am not 100% familiar with, as its tab completion is excellent.
Run it with ipython --pprint though, so you get automatic pretty printing. I also recommend using the qtconsole plugin , as it is Much Nicer.
Basically, if you like bpython, this is better in every way I can think of. If you like the plain python REPL, give this a shot anyway. :) You may be pleasantly surprised.
You can also use the ipython notebook instead of qtconsole if for some reason you refuse install At on your system (I have a GTK using friend who so refuses).
"PDF versions are coming. PDFs are the least-prefered method to read the book, as pdf's are static and non-interactive. If PDFs are desired, they can be created dynamically using Chrome's builtin print-to-pdf feature."
While I agree PDFs are antiquated, I still like them for casual, off-the-grid reading, and opening many different pages and printing to PDF is not feasible or easy to organize once on my iPad for reading. All the same, I'll check this out.
I hate reading substantial things on any kind of backlit screen. Somehow my attention seems to wander. But I find that I can focus somewhat better on pdfs than webpages. I suppose it's some sort of philistinic nostalgia for the ultimate static and non-interactive medium that is the paper book.
That said, this idea looks awesome. I'd still appreciate a pdf to supplement this, though :).
I prefer PDF simply because usually PDFs have better typography than web browsers for reading sessions. Frankly the chapters and fonts from the github page look ... welll... bad in Chrome anyway.
There are extensions for Cython, Octave and C too. And there's a generic mechanism for scripting languages (%%script), but that only captures stdout, rather than moving objects between different languages.
I'm on the first chapter, but I can tell this is the perfect book for me. Very well written. Easy to follow for a non-mathematician and overall, a perfect introduction to a field I'm very interested in learning. I've worked through books such Programming Collective Intelligence but this closes the gap between blindly following along and actually understanding the fundamentals which that book lacks.
The controversy of this survey [1] could have been avoided if the survey authors had used the coin flip algorithm in Chapter 2. (Navigate to Chapter 2 on Github and Ctrl+F on "Example: Cheating" without quotes. Maybe someone should submit a pull request to add anchors to the HTML output.)
View each chapter in the browser and save as web page - complete. Then use Calibre (http://calibre-ebook.com/) to convert the html to your ebook format of choice. It's kind of a pain, but I've done this for my Kindle with a bunch of web material.
I've only skimmed very briefly each chapter, but I'm a pretty big fan of PyMC and am extremely impressed from what I saw regarding its use and how to think about optimization problems from a Bayesian framework. Chapter 5 and the Dark World example were particularly interesting.
Good book but I'd like to see more detailed explanation of MCMC inner workings. I'm very uncomfortable with using mathematical blackboxes I don't understand. Can anyone suggest a good introductory textbook into MCMC?
It has much more than MCMC, but the "Probabilistic Graphical Models" textbook by Daphne Koller (of Coursera) and Nir Friedman is a thorough introduction to the subject. It includes a discussion of MCMC that will leave you with a deeper understanding than the typical shallow treatment.
As an alternative, you can try watching about 90 minutes of lecture starting here:
There should really be a standard interactive-ebook file format (something like a pdf + ipython, or an open-source CDF). Low power devices could easily degrade to text only with good tipography.
I couldn't agree more. The content has been relevant from day one and it's been looking even better ever since. I've passed it in a few lectures already and people really love it already :-)
What a wonderful thing -- this book looks fantastic, but the approach to making it really takes the cake. I really hope interactive notebooks (iPython based or otherwise) and multiple authors collaborating on Github will become widespread.