Hacker News new | past | comments | ask | show | jobs | submit login
Recommended Self-Study Path for Statistics
70 points by tamiddlemanager on Oct 13, 2016 | hide | past | favorite | 24 comments
I'm a middle manager at a large tech company that recently took responsibility for a few engineering teams that do some stats heavy work. Each producing forecasts where we talk about accuracy of predictions, some that use machine learning & the like.

I have a fairly shallow understanding. I took a basic stats class in undergrad 20 years ago. Over the years, I’ve seen various analysis so I’m not totally lost when people talk about p-values (but also have a lot of gaps where some of the details are lost on me).

I’d like to strengthen my understanding so I can better understand/appreciate/represent what the teams are doing. Any recommendations on a course of study?

FWIW, I’ve considered just buying a text book or hiring a tutor. I looked at classes at the local community colleges, but both the syllabus suggests a fairly slow pace (and most things I feel comfortable with) and when I took a previous class (iOS development) it was so dang slow in pacing.




Hello, I studied economics and statistics in college and grad school, and worked as a teaching assistant for undergraduate statistics courses. Here is a short, annotated bibliography of my favorite statistics books.

1. Ayres, Ian (2007) Super Crunchers: Why Thinking by Numbers is the New Way to Be Smart

[Good introductory summary of the main concepts in statistics with many real-world examples]

2. Bernstein, Peter (1996) Against the Gods: Remarkable Story of Risk

[Intellectual history of statistics, accessible to beginning students.]

3. Healey, Joseph (2005) Statistics: A Tool for Social Research, 7E

[This is the text book that was used in the undergraduate statistics courses while I was working as a teaching assistant at UC Santa Cruz.]

4. Kahneman, Daniel (2011) Thinking, Fast and Slow

[Kahneman combines cognitive psychology with statistical concepts; highly recommended]

5. Silver, Nate (2012) Signal and the Noise: Why So Many Predictions Fail, but Some Don't

[Silver's book offers an excellent summary of major concepts in statistics and how they are applied to real-world problems]

6. Taleb, Nassim Nicholas (2005) Fooled by Randomness, 2E

_________ (2010) Black Swan: Impact of the Highly Improbable, 2E

[Important critique of statistics and how it is mis-used and mis-applied, particularly in econometrics]

Hope this helps. Shoot me an email if you have any questions. Good luck. mitchelldeacon9@gmail.com


These books are more for casual reading, except for Book 3. If I may, do you have more recommendations similar to Book 3? In addition to my personal interest, my read of the OP request was he looking for more technical details. For myself, what books are good for a second or third course in stats? I have a finance background, so I'm familiar with the intro stuff in Book 3.

As a recommendation to the OP, "Collective Intelligence" by Toby Segaran is amazing.


The only advanced textbook on statistics that I would recommend is: Kennedy, Peter (2008) Guide to Econometrics, 6E


I encounter this a lot at work. People needing more advanced stats skills for a new role and not having much training in it or if they did it was years ago. (I work in finance which has become increasingly quant and stats heavy - faster than training in it has).

My advice: Figure out exactly what type of stats work your teams are doing. Make a list of those topics. Random example: are those Kolmogorov–Smirnov or Mann–Whitney tests? Then hire a tutor who knows that stuff - maybe a grad student somewhere, can be remote over skype even.

If you are not 100% sure what you are looking at at work and what to put on this list of topics...hire a tutor and show them stuff from work (if the work is proprietary/confidential, recreate it with dummy data or just give rough examples) and ask what topics would be needed to nail one's understanding of this work.

Statistics is a huge subject and if you buy a textbook you may spend a ton of time on stuff that's just not relevant when you could be going a bit deeper into a sub-topic that is very relevant to your work. Also a lot of what looks like statistics is actually found under applied math books/courses not statistics.

Lastly, in case this needs be said, after you get the basics on a stats topic, the most important question to ask a stats tutor is "where do people usually fuck up when doing this?"

Stats in practice is often more about not making errors than it is about accuracy. Find out where people often fuck it up, especially as a manager and 2x as they are not statisticians either it sounds like.


"Stats in practice is often more about not making errors than it is about accuracy."

What's the nuance? (Serious question)


TL;DR What to focus on? Some statistics advice: Don't be a liar. Don't be a biased idiot. Don't fuck up. The software should handle the rest.

Nuance is a fair serious question. And this could easily turn into a debate of semantics or philosophy (will add links at bottom tho^).

But what I meant was statistics in practice isn't about proof of some truth but about chance of disproof. An analogy in jurisprudence: there is a difference between "not guilty" and "innocent".

Someone may or may not be "innocent". There's even presumption of innocence. But then in practice, lawyers give evidence to a jury to decide beyond a reasonable doubt if someone is "guilty" or "not guilty".

What's the focus? It sure looks like the work is more focused on "not guilty" vs "innocent".

Furthermore, in statistics there are errors...eg statistical errors, random errors, systematic errors, type 1 errors, non-sampling errors...lots of errors. You can't eliminate them. But you can be aware of them and reduce them where possible.

Now, statistical software deals with errors to the extent statistics techniques exist and the technology can handle the process. Sort of like spellcheck.

But software can't fix everything. Most importantly it can't fix if the person using software is an idiot.

Too many times I have looked like an idiot for sending an email where spellcheck put the wrong word. What to do? I could write a new algo to make spellcheck better or I can just double check the email next time.

^Links to semantics and philosophy stuff: Some fields try to have precise, official definitions for words like "error" and "accuracy".

See ISO 5725 or longer list of examples on wikipedia: https://en.wikipedia.org/wiki/Accuracy_and_precision

Of course, philosophy also addresses the nuances. Way more fun to read than ISO technical documentation.

Short list of philosophy of statistics issues on wiki: https://en.wikipedia.org/wiki/Philosophy_of_statistics.

Better, longer list, which is worth reading as it includes more interesting and broader philosophy of science issues: http://plato.stanford.edu/entries/statistics/

If lists of philosophies are overwhelming and you want one random example of it...What is the probability the sun rises tomorrow?

Long post. Lastly, a joke: 'A physicist, an engineer, and a statistician go duck hunting. They spot a duck in the distance and the physicist takes the first shot, but just misses left. The engineer shoots next, but just misses right. The statistician yells, “we got it!”'.

[Edit] At this point I might as well add Buffet's 2 rules for investing: "Rule No. 1: Never lose money. Rule No. 2: Never forget rule No.1”


Thanks. Any idea how to go about finding a tutor?


Where you located? Likely a local tutoring company can find one for you at the skill level you need.

Others on here might have input on websites that do this.

We use pretty advanced stats at work so I cold emailed a professor at a local university and he was able to connect to grad students (they get these types of emails more than one thinks).

If interested, if I remember it was ~$120 an hour, the tutor had a PhD already and was doing postdoc. 3 hours a week with a tutor combined with self study at home for about 2 months before feeling "ready".

The self study never stops of course.


PS. If you think it would help and if you are able to post more info about the type of work your guys are doing, I'd be happy help answer questions you may have.


Get your own consigliere. Hire a consultant to advise/mentor you individually.

You might find one on Hourly Nerd > https://hourlynerd.com/your-matches/information-technology-a...


I agree with the above recommendation.

I have read John Cook's blog for some time and think he might be a good fit as a consultant for bringing you up to speed and reviewing your current teams' work.

http://www.johndcook.com/blog/

I suspect it would be worth your time to get a consultant like him to come in for a week, review all the work going on in your org, and maybe lay out a learning plan or crash course in developing a safety net for you.


Allen Downey offers many free and excellent textbooks online from his Green Tea Press [0].

You can learn statistics, Bayesian reasoning, and a bunch of other stuff.

Sites like Coursera, edx, and Udacity all have courses for other presentations and applications of statistics at pretty much every degree of difficulty.

[0] http://greenteapress.com/wp/


You can take free online courses to refresh your Statistics knowledge.

Coursera https://www.coursera.org/browse/data-science/probability-and...

Coursera - Making sense of Data http://academictorrents.com/details/a0cbaf3e03e0893085b6fbdc...

MIT 6.041 Probabilistic Systems Analysis and Applied Probability https://www.youtube.com/playlist?list=PLUl4u3cNGP61MdtwGTqZA...

Statistics 110: Probability - Harvard https://www.youtube.com/playlist?list=PL2SOU6wwxB0uwwH80KTQ6...

Udacity also has few courses on Statistics.


Old but gold. Kennedys 'A guide to econometrics'. Not so much a textbook as a book that ties theory to practice and explains common pitfalls and intuitions. It's one upmanship in statistical practice.

You shouldn't try to learn stat on par with your teams. Learn to ask the right questions.

If you prefer learning by doing then Elements of statistical learning would give you some modern skills plus add good questions (model testing and prediction are imho more important than base skills, and central to the work of Tibshirani et.al.) to your book.

I think the coaching approach in the other response thread is worthwile as well. If you weren't really into stats before and haven't read up when it wasn't part of your day job, the route of learning the skillset seems a detour. Possible if motivated ofc, but you need advice on managing stat heavy teams. That is a different, though related ballpark.


I'm working my way through these videos as way of refreshing my basic knowledge of stats.

https://www.youtube.com/playlist?list=PL5102DFDC6790F3D0

Very well explained.


I'm currently addressing this need where I work and posting my training materials at http://RForecasting.com as they are developed. Also I like https://www.amazon.com/Data-Matters-Conceptual-Statistics-Ra..., but you can see I'm biased on that.


CMU's self-paced and free Open Learning Initiative classes on statistics might be helpful early on:

Empirical Research Methods: http://oli.cmu.edu/courses/future/empirical-research-methods...

Probability and Statistics: http://oli.cmu.edu/courses/free-open/statistics-course-detai...

Statistical Reasoning: http://oli.cmu.edu/courses/free-open/statistical-reasoning-c...


This free online textbook using R is what I used when I last taught statistics in a college setting : https://www.openintro.org/stat/textbook.php .


Discovering Statistics Using R by Andy Field is a fantastic and entertaining book.


Andy Field also posted some great lectures that pair well with Discovering Statistics here: https://www.youtube.com/user/ProfAndyField/videos


This + the book look like a great recommendation. Thank you both!


Why not to ask your developers how they studied this field?


I've heard good things about Naked Statistics by Charles Wheelan, but I've never actually read it myself


Statistics 110 Harvard University video lectures by Joe Blitzstein is a good place to get you started.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: