The Elements of Statistical Learning [pdf]

scythmic_waves · on Dec 31, 2020

To echo some of the other comments here, this is a text that's really only appropriate for those who already have a graduate-level grasp of statistics. I love that it's freely available, but ESL is not an introductory text.

For example, here's a screenshot from the introductory chapter (pg. 26): [1]. The authors expect you to already be familiar with matrix analysis applied to statistics.

An Introduction to Statistical Learning (ISL) [2] is aimed at those with a high school level of math.

[1] https://imgur.com/q0NeqdR [2] https://statlearning.com/book.html

cashweaver · on Dec 31, 2020

I found that even Introduction to Statistical Learning made a few too many assumptions when I tried to work through it. I recently finished Jim Hefferon's Linear Algebra [1] and now I'm working through Introduction to Applications of Linear Algebra: Vectors, Matrices, and Least Squares [2] (along with a python companion [3]). The two texts have overlaps but I've found them more helpful than redundant; it's nice to hear different angles on the same topic. I'm planning to focus on statistics next with Blitzstein and Hwang's Introduction to Probability [4] before returning to ISLR.

[1] http://joshua.smcvt.edu/linearalgebra/

[2] http://vmls-book.stanford.edu/

[3] https://ses.library.usyd.edu.au/handle/2123/21370

[4] https://projects.iq.harvard.edu/stat110/home

scythmic_waves · on Dec 31, 2020

> I found that even Introduction to Statistical Learning made a few too many assumptions when I tried to work through it.

Not at all surprising. From the preface:

> One of the reasons for ESL's popularity is its relatively accessible style. But ESL is intended for individuals with advanced training in the mathematical sciences.

> ... [ISL] is appropriate for advanced undergraduates or master's students in statistics or related quantitative fields or for individuals in other disciplines who wish to use statistical learning tools to analyze their data.

So by that reading, the authors simplified ESL's material from "advanced training in the mathematical sciences" down to "advanced undergraduates or master's students in statistics or related quantitative fields". I think that tells you all you need to know about how difficult ISL ESL should be expected to be.

Given that even ISL expects you to be partway through a university education in math and stats, if it's been a while or if you never studied linear algebra, statistics, or probability at that level in the first place, you won't be ready. That's probably why it irks me so much that ESL gets brought up so much as the starting point for a lot of folks. It's a good starting point for a Ph.D. from another field, but not for, like, a random software developer who's got an interest in ML. It's just setting them up for failure when the SIMPLIFIED version expects them to be partway through a relevant degree.

> I'm planning to focus on statistics next with Blitzstein and Hwang's Introduction to Probability [4] before returning to ISLR.

I think your references form a really solid sequence of prerequisites. I'll again plug what I've been plugging in a few other comments: [1]. In that one, you could probably get through ISL after the Hogg text. But yours is totally fine as well.

One other thing I'll add: I found stat110 and its companion book to focus a little too much on the "challenging" problems. It's like Blitzstein reveled in tricking you with the unintuitive parts of probability. Maybe because of his background in competition math? IDK. I like the novelty of the challenging problems, but I wish they weren't so front and center in his presentation. (I also found the whole story-proof concept a little strange.) Still, the fact that so much is online for free -- including video lectures -- makes it a great resource.

[1] https://www.reddit.com/r/learnmachinelearning/comments/ggpzk...

nerdponx · on Dec 31, 2020

Whether this is "graduate level" depends on your undergrad program. Most undergrad seniors in math or stats departments that I know of should be able to handle this material.

scythmic_waves · on Dec 31, 2020

So by "graduate level", I don't mean that at no point could any undergrad follow the material. I'd wager that you're correct, most seniors, and many juniors or sophomores with the right sequencing, can follow the material. But that same thing can be said about _tons_ of graduate course work. From my experience, many graduate classes have a few advanced undergrads in the class.

By graduate course work, I mean "builds on an undergraduate-level understanding of the material". And yes I'm being a little hand-wavy about what demarcates grad from undergrad (which probably isn't well defined anyway), but I hope the gist of my meaning is clear.

ianhorn · on Dec 31, 2020

Physics upper level undergrads too. There’s a pretty fuzzy line between something appropriate for the end of undergrad versus the beginning of grad school.

If you have had vector calc, basic probability and stats, and linear algebra, the book is accessible. Especially if you had a numerical methods course somewhere along the way.

LaserPineapple · on Dec 31, 2020

> There’s a pretty fuzzy line between something appropriate for the end of undergrad versus the beginning of grad school.

I agree. I've come across countless math textbooks claiming to be aimed at "beginning grad students and advanced undergrads." There's a lot of variability in that cross section of readers. Thinking back to my own senior year as an undergrad, some of my peers were extremely bright and were bound for top grad school programs, and some were just barely scraping by managing to graduate by the skin of their teeth.

kqr · on Dec 31, 2020

I have a relatively weak maths background but have had plenty of use of ESL anyway. The authors are clear about which techniques they use at all points, so if there's something missing in my toolbox (and there is!) I can fill it in on my own.

It would be a much bigger problem if they assumed not only that you can do these things, but also which things to do!

melling · on Dec 31, 2020

I’m up to Chapter 6 in ISLR

https://github.com/melling/ISLR

Would Elements of Statistical Learning be my next book?

I’ve seen the Bishop book highly recommended too, and it has been mentioned in this post.

https://www.amazon.com/Pattern-Recognition-Learning-Informat...

kvathupo · on Dec 31, 2020

I'm a recent graduate from undergrad doing work in deep learning. While by no means an expert, I'm inclined to respectfully disagree with the assessment of /u/scythmic_waves. The Reddit roadmap is total overkill as a pre-req for ESL and Bishop (Analysis, Topology, and proof-based Linear Algebra are certainly not needed for them).

I think the only hard pre-req would be a solid understanding of non-axiomatic probability up through the Law of large numbers. For the rest, I'm of the, perhaps naive, school of thought that one ought to jump in the deep end, and consult a variety of sources as need be. iirc, most of Munkres, and Hoffman & Kunze are not needed for these books. Granted, you might find yourself picking these books up as your focus narrows, but for these books, you don't need them.

With that out of the way, I'd highly, highly recommend Bishop as reading, after ISLR.

Edit: In response to your other comment, I also disagree: proofs, especially for regression problems, are important for understanding why we use them.

scythmic_waves · on Dec 31, 2020

> While by no means an expert, I'm inclined to respectfully disagree with the assessment of /u/scythmic_waves.

Not a problem! These are all just my opinions.

> Analysis, Topology, and proof-based Linear Algebra are certainly not needed for them

Although this is explained in the prose of the document, I should have highlighted it myself: only the nodes in blue are required. The orange nodes (Analysis, Topology, Functional Analysis, etc.) are extra. They aren't required for ESL.

Honestly, as long as you get up to the level of the Casella & Berger text, you'll probably be fine. And a lot of C&B can be skipped (like the focus on ANOVA or experiment design). But I also like that roadmap because after C&B, there's additional emphasis on Linear models which is helpful for ESL.

> For the rest, I'm of the, perhaps naive, school of thought that one ought to jump in the deep end, and consult a variety of sources as need be.

And I suppose this is where you and I differ. I find it discouraging to need to stop partway though a text and go learn a whole new subject area before continuing. Instead, I find that building up the foundation and then working through a text to be a more enjoyable experience because it's just building on what I know.

But to each their own!

kvathupo · on Jan 1, 2021

I just wanted to say that I appreciate the courteous reply! :)

scythmic_waves · on Dec 31, 2020

> Would Elements of Statistical Learning be my next book?

Honestly no I don't think so. ESL is likely too advanced.

I would use that screenshot I posted above as a litmus test. Do you understand that notation? The `E` with the subscript? And why they're using `trace[]`? If you do, then you can likely follow ESL. If not -- which would be understandable because even early undergrads likely can't -- then I say you shouldn't try and follow ISL up directly with ESL. It really is a graduate text.

> I’ve seen the Bishop book highly recommended too, and it has been mentioned in this post.

Bishop has a similar problem: [1]. I had to scroll to chapter 2 for this screenshot (pg. 83), but it really is par for the course.

So, and this is totally my opinion here so YMMV, recommendations for foundational ML info tend to be wildly too advanced for the people seeking them out. I'm a math-y person. I really like learning about the math foundations of ML. But ML builds on a lot of other concepts and you can't just jump into the deep end. In my opinion, ML foundations should come at the end of a lengthy sequence of math and statistics courses. Students will just be too lost without them.

I don't mean to be discouraging here. I think nearly anyone who's willing to put in the time can learn this stuff! But here's a more reasonable sequence I found on reddit a while back that would set someone up nicely for being able to follow ESL: [2]. Without the proper foundation, it's just too difficult to follow ESL or Bishop IMO.

Last, I'll note that you don't need to understand the nitty-gritty of ML math to be an ML practitioner. In fact, I'd argue that taking the effort would be distracting because 1) a basic understanding (like you'd get from working through ISL) is probably good enough to start messing with libraries and 2) practitioners need a whole bunch of other knowledge (like general software skills and how to maintain ML datasets) that they also have to take the time to learn.

[1] https://imgur.com/uXWZ6Bv

[2] https://www.reddit.com/r/learnmachinelearning/comments/ggpzk...

melling · on Dec 31, 2020

It has been a while, but I do understand Σ, e^x, ln, matrices, vectors, etc.

However, like you mentioned, you don't need to work through the proofs to understand logistic regression, lasso, ridge regression, and bootstrapping, for example.

stevegalla · on Jan 1, 2021

What is your background and what is your goal for learning the methods?

This is somewhat long and there is a disclaimer towards the end, but hopefully some of this is helpful.

Working through a book can mean reading what’s on the pages and being able to recall names of techniques or methods. Or using pen and paper to work through the examples and be able to solve problems. This could even be deriving what’s in the book from first principles.

This will depend on what you want to do with the material. If you want to apply it using pre-made R packages, you probably don’t need to recreate everything from scratch and you can probably get away with ISL. If you want to be creating new methods or going beyond pre-made R packages, then you probably need to work up to ESL and solve things from first principles.

ISL is used in an undergrad elective course at my uni. The prerequisite stat material covers Devore probability and stats for engineers and intro to linear regression by Douglas Montgomery. ISL would be a third course in stats (see the bottom for the math background 4 courses). There are entire courses dedicated to the topics in ISL, so I really think ISL is most useful to bring previously studied topics together.

ESL is used in a second year MSc course. This assumes knowledge of mathematical statistics (Casella and Berger Statistical Inference + Wasserman All of Statistics), computational statistics (topics: bootstrap, MCMC, EM algorithm, numerical analysis methods, optimization, and matrix decomposition) and courses on linear regression and the general linear model. So it’s a “capstone” of sorts that ties all of the material together. I haven’t taken any of these courses, so I can’t comment on what’s really necessary.

Disclaimers follow: As others have mentioned someone’s background and preparation may be different and more advanced than what is outlined. Above I outlined the course sequences for ISL and ESL at my uni. We do not require a course on real analysis and we do not do measure theoretic probability (PhDs do but ESL is covered in the MSc that is required for PhD admissions). Of course not every chapter in a textbook is covered in each course and I’m sure there is some sort of minimal coverage of topics that will allow you to get to ISL or ESL in a more efficient way. What that is, I am unable to comment on.

Yes there are people admitted to the MSc program without a stats BSc degree. Examples are physics, math, and computer science majors from what I have seen. Usually they have to make up missing BSc math stats courses.

Undergrad level math background assumes calculus to include multi variable calculus (Stewart Calculus omitting the chapters on vector calculus). Partial derivatives, Lagrange multipliers, multiple integrals. Also linear algebra, matrix multiplications, determinants, eigenvalues, trace (linear algebra and its applications by Lay).

kensai · on Jan 1, 2021

Here is a direct link to download freely the ISL: https://statlearning.com/ISLR%20Seventh%20Printing.pdf

heimatau · on Dec 31, 2020

The authors made a free video course for ISL. It's on Youtube.

phonebucket · on Dec 31, 2020

Of interest to people who like the look of this book is Bishop's Pattern Recognition and Machine Learning, also available freely and legally online: https://www.microsoft.com/en-us/research/people/cmbishop/

nextos · on Dec 31, 2020

There's also a newer book by Hastie (and Efron!), which I very much prefer to The Elements of Statistical Learning: Computer Age Statistical Inference.

https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf

It's really well motivated and, unlike ESL, discusses many different schools---including classical inference, empirical and Bayes deep learning. Without these different perspectives, newcomers often find statistics very obscure as it just looks like a bag of tricks.

blackbear_ · on Dec 31, 2020

And to close the triad of machine learning bibles, don't forget Murphy's, which will apparently be extended soon! https://probml.github.io/pml-book/

bonoboTP · on Dec 31, 2020

I also like Alpaydin's Intro to ML even though it's not as famous: https://mitpress.mit.edu/books/introduction-machine-learning...

samch93 · on Dec 31, 2020

Another hidden gem is Webb‘s Statistical Pattern Recognition https://www.wiley.com/en-us/Statistical+Pattern+Recognition%...

iamcreasy · on Jan 2, 2021

Why do you think it's a hidden gem?

st1x7 · on Dec 31, 2020

So much of that book just goes over my head while I didn't have that problem with ESL. I don't know if it's Murphy's writing style or just the way he approaches the topic but I found his book significantly more difficult to process.

blackbear_ · on Dec 31, 2020

On the contrary, I like Murphy and cannot stand ESL. It probably boils down to what statistical camp you are more comfortable in.

st1x7 · on Dec 31, 2020

My problem is more with understanding, not necessarily with liking or disliking either book.

dragandj · on Dec 31, 2020

Yes! They are just like the Bible. Concise, applicable, on the point, and you can learn so many useful (and true!) things from them.

pontus · on Dec 31, 2020

I love this book so much. It takes a strong Bayesian point of view that makes things so clear to me. It's well written and we'll structured. It starts with a summary chapter of ML which honestly by itself gets you to a very good place in understanding the basics of ML.

preommr · on Dec 31, 2020

I remember putting this book in my pile almost a decade ago, is it still relevant?

madiyar · on Dec 31, 2020

This collection of Jupyter notebooks that reproduces graphics and implements algorithms from the book could be a nice supplementary resource. https://github.com/maitbayev/the-elements-of-statistical-lea...

bumby · on Dec 31, 2020

Dang, I was really hoping to find examples of the MCMC methods in Ch. 8.

A strong point in the "Introduction to Statistical Learning" by the authors is that each chapter ends with example programs in R (albeit with a fair number of typos).

michaericalribo · on Dec 31, 2020

This is a great reference, but it’s pretty terse...it would be hard (IMO) to learn something for the first time from EoSL, though it’s great for having all the details and derivations. It’s just a very technical book, and though it is complete it is also quite inaccessible.

thegginthesky · on Dec 31, 2020

That's why they wrote Introduction to Statistical Learning[0] and also a video series for the same book[1]. Both books and the video classes are a must for anyone working with Machine Learning and/or Statistics.

[0] http://faculty.marshall.usc.edu/gareth-james/ISL/ [1] https://www.youtube.com/watch?v=5N9V07EIfIg&list=PLOg0ngHtcq...

LaserPineapple · on Dec 31, 2020

And a second edition of ISLR is "coming soon" [1].

[1] https://twitter.com/daniela_witten/status/126169362443927961...

michaericalribo · on Dec 31, 2020

Definitely. And hell, EoSL is a _wide_ survey, and some of the content is only adjacent to ML. Eg, MCMC for Bayesian models: there are good applications in ML, and a short bit in EoSL about the Gibbs sampler, but there’s a massive parallel literature in the statistical inference world.

jimsparkman · on Dec 31, 2020

[0] returned a 404, but this appears to be substitute: https://statlearning.com/

thegginthesky · on Dec 31, 2020

Hmm weird, it works for me. Thanks for the alternative link

kgwgk · on Dec 31, 2020

For a more digestible alternative, see https://news.ycombinator.com/item?id=25592296

jll29 · on Dec 31, 2020

Not quite finished yet but coming soon: Speech & Language Processing (3rd ed.) https://web.stanford.edu/~jurafsky/slp3/

vowelless · on Dec 31, 2020

Probably the most recommended stats book out there. And for a reason. Certainly not an intro book. But, at some point, anyone interested in stats, ML, should go through it once.

sischoel · on Dec 31, 2020

Is there some survey/list somewhere which of these topics are still relevant today, might have a comeback in the future or are definitely a thing of the past?

A lot of these topics where somehow mentioned in some of my ml courses at university but my professors never really bothered to put them in a bigger picture.

person_of_color · on Dec 31, 2020

What is the equivalent book for Deep Learning?

huitzitziltzin · on Dec 31, 2020

I really like Aggarwal's "Neural Networks and Deep Learning". I recommend it highly. (At least this summer Springer was giving away a PDF - not sure if that's still true.)

To me, Goodfellow et al. spent the first hundred and fifty pages on stuff which is important, but covered better elsewhere (e.g., probability theory, numerical methods) and didn't belong in their book at all. Simultaneously, I didn't get that much out of the "core" chapters on RNNs, CNNs, etc, relative to what I got out of other books. I think the book is somewhat overrated, frankly, but YMMV!

selimthegrim · on Dec 31, 2020

Aggarwal has a good explanation of backpropagation

blackbear_ · on Dec 31, 2020

Goodfellow's [1] comes closest I'd say, but since deep learning is still moving very fast and lacks a solid formal grounding (unlike most other machine learning methods) it's not so easy to find a comprehensive book yet.

[1] https://www.deeplearningbook.org/

mrfusion · on Dec 31, 2020

Will that teach me transformers too?

nmfisher · on Dec 31, 2020

No (at least, not the version I read a few years ago). But transformers are a specific neural network architecture, so I’d still recommend the book for the fundamentals around backpropagation, activation & loss functions, etc.

Once you’re comfortable with neural networks (and the notation), the “Attention is all you need” paper is fairly accessible.

mrfusion · on Dec 31, 2020

> Once you’re comfortable with neural networks (and the notation)

I think I am actually. And I read attention is all you need. (Well half of it) and it didn’t seem to delve into how they work.

blackbear_ · on Dec 31, 2020

It really does, read section 3 carefully. As for why they work so well, that's still an open question.

AlexCoventry · on Dec 31, 2020

For transformers, try this (also links to a video introduction): http://jalammar.github.io/illustrated-transformer/

huitzitziltzin · on Dec 31, 2020

Covered in "Dive Into Deep Learning," which another commenter has recommended.

https://d2l.ai/chapter_attention-mechanisms/transformer.html

credit_guy · on Dec 31, 2020

Maybe "Dive into Deep Learning"

https://d2l.ai/d2l-en.pdf

phonebucket · on Dec 31, 2020

To my mind: https://www.deeplearningbook.org/

the_duck · on Dec 31, 2020

The authors of this book also offer an excellent free online course that covers some of the same material. I found it incredibly well presented and easy to follow, without being superficial.

https://online.stanford.edu/courses/sohs-ystatslearning-stat...

iamacyborg · on Dec 31, 2020

I have the paper version of this but it goes way over my head. What should I read first to make sense of this?

evanpw · on Dec 31, 2020

"An Introduction to Statistical Learning" was written by two of the same authors, and is explicitly meant to be a lower-level introduction to the same ideas: https://statlearning.com/ISLR%20Seventh%20Printing.pdf

iamacyborg · on Dec 31, 2020

Thank you!

erdos4d · on Dec 31, 2020

Rather than a lower level intro, I'd recommend you bone up on linear algebra and multidimensional/vector calculus. The ideas in this book are not especially difficult to get but the math is right up front all the way through. If you can't actually read the equations and visualize what is being discussed, it will be quite difficult to ever make real progress with this book.

credit_guy · on Dec 31, 2020

Try "Computer Age Statistical Inference" by Efron and Hastie.

https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf