I am the author of this tutorial. If you are interesting in contributing a section to this tutorial, please get in touch. Some suggested topics: survival analysis, mixture models, classification, time series models... Twitter @markdregan
It would be nice if random variables were denoted by capital letters to distinguish them from particular observations.
In the section about regression you should have the conditional mean of Y equal to \beta X, rather than the overall mean.
Of course, this doesn't really matter too much since the substance of the tutorial is correct. I thought I'd mention it though because sloppy notation can undermine your credibility in some circles, and the equations are the most notable object when you're just skimming the page.
More (very) pedantic notes:
Usually, if you're going to express a sum with \cdots you put A+B+\cdots+Z (with plus signs on both sides of the ellipse for clarity)
The probability operator is usually denoted with a capital P
The log function shouldn't be italicized, you can just type \log to avoid this
Distribution names in statements like y~Pois(\mu) aren't usually italicized.
Looks pretty nice, good job. I think survival analysis is a very underrated tool. I'm working in UX now and there's a lot of test setups were survival analysis makes a lot of sense but isn't used (mothly because people don't know it).
However, I think the introduction could be improved by briefly describing the "why/what" of Bayesian modeling before you get into the first Hangouts example.
I have a few suggestions, maybe i missed it, but a prerequisite section would be useful both for knowledge and platform, software etc.
I am new to python and believe this tutorial would be great for me. However in the case of novice-users as myself, lots of time is spent getting the environment right rather than understanding the code.
For example, after downloading and installing anaconda, jupyter and seaborn, i stumble on error message "C:\Anaconda3\lib\site-packages\ipykernel\__main__.py:89: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)"
And here i am stuck, my next step, had it not been this post, would be to investigate syntax changes in python.
Maybe I that's not a correct way to address that problem however that is mainly my point. If the tutorial is targeted to beginners as me, a few more pointers to common errors setting the environment up would be helpful!
Thank you for otherwise great tutorial and keep up the good work!
This is great work. I don't have any substantive comments (need to read it in depth for that). I did miss the lack of "next" links, though - not sure if there is a Jupyter-native way to do that.
I don't have any Google Hangout chat messages to run the first example of using jupyter. I know that you are not going to share your data, but it should be handy if some fake conversations could be included. People like me like to first install the applications and then run it to see whether it works as claimed. I installed the conda distribution and the jupyter notebook works correctly. (I installed conda in ubuntu and then seaborn, PyMC3 and panda (PyMC3 and seaborn with pip since conda install 2.3 of PyMC3). It works.
I should say that the first step is to clone:
cd where_you_want_the_data_to_be_copied
git clone ....
# and now start jupyter notebook with
jupyter notebook
# go to File/open/ and select the first section.
I see that I can edit the markdown. I translated the introduction to section 0, here it goes. Thanks for this tutorial. The graphics are nice.
### Sección 0: Introducción
Bienvenido a "Bayesian Modelling in Python" - un tutorial
para personas interesadas en técnica de estadística bayesiana con Python. La lista de secciones del tutorial se encuentra en la página web del projecto [homepage](https://github.com/markdregan/Hangout-with-PyMC3).
La estadística es un tema que en mis años de universidad nunca me gustó . Las técnicas frecuentistas que nos enseñaron (p-values, etc.) parecían rebuscadas y en última instancia di la espalda a este tema en el que no estaba interesado.
Esto cambió cuando descubrí la estadística Bayesiana - una rama de la estadística bastante diferente a la estadística frecuentista que se suele enseñar en la mayoría de las universidades. Mi aprendizaje se inspiró en numerosas publicaciones, blogs y videos. A los que se inician en la estadística bayesiana les recomendaría fervientemente los siguientes:
He creado este tutorial con la esperanza de que otros lo encontrarán útil y que les servirá para aprender técnicas bayesianas de la misma forma que me ayudaron a mí. Cualquier aportación de la comunidad corrección/comentario/contribución será bienvenida.