The intro speaks truth and brings back a lot of frustrating experiences:
>We may have all heard the saying “use it or lose it”. We experience it when we feel rusty in a foreign language or sports that we have not practised in a while. Practice is important to maintain skills but it is also key when learning new ones. This is a reason why many textbooks and courses feature exercises. However, the solutions to the exercises feel often overly brief, or are sometimes not available at all. Rather than an opportunity to practice the new skills, the exercises then become a source of frustration and are ignored.
Typical exercise soltion: How to draw an owl. 1. Draw some circles. 2. Draw the rest of the $@#% owl.
> We may have all heard the saying “use it or lose it”. We experience it when we feel rusty in a foreign language or sports that we have not practised in a while
Another sad thing is that we lose our memory in a step function. We remember something for a long time without using it, and then all of sudden it's gone from our memory. That may explain why many lifers in Google could interviewed so badly. They joined Google when there was no leetcode and when Google had amazing high standards in asking math and algorithm puzzles and novel systems design questions. So, they are really good. They are also confident because they achieved a lot and led amazing projects in Google. Yet when they started to answer interview questions, they struggled with basic facts.
> Another sad thing is that we lose our memory in a step function. We remember something for a long time without using it, and then all of sudden it's gone from our memory.
Not sure memory loss is quite that binary. Heading for mid-40s here, and often things are still there, but it's higher latency to fully recall it. A bit like it's stored off in Amazon Glacier. Or often I can get the first byte quickly ("I know that person's surname starts with a P") but retrieving the full answer takes longer and some brute force iterations. It's as if I've hit a hash-table collision and need to binary-compare the results, or I've reached a node in a tree-structure that has many branches ("Is it Pfeffle? Piper? No, Pfeiffer!")
That is - drawing things on paper, no formulae. I used to do similar exercises with the printed Iris dataset and giving people a transparent foil to draw the classifier. Then, giving them another sheet of paper with the validation dataset. The exercise was loved by people from high-school students to managers.
Some good stuff here! I'm surprised they didn't show many analytical results in neural networks. For example, I like making candidates for research in deep learning derive back propagation. You can show a wide variety of interesting results in single neuron models as well.
In a lot of academic fields it's assumed that a researcher really understands their stuff to first principles. I think being able to derive back prop is a really straightforward exercise and definitely you should expect a researcher in the area to be able to do it off the cuff. I think it's akin to fizz buzz; it'll weed out people who really don't know what they're doing but it won't tell you too much about those who do it without trouble.
I thought so too, until my friend told me that CIT PhD and an applied research scientist in a FANNG company derived an incremental Gaussian mixture model without using the property of GMM at all, and another CIT PhD in the same team defended the algorithm by saying something like "but the intuition is correct". I couldn't believe my ears.
but most researchers don't use first principles day to day, like others in this thread -have said - if you don't use it you lose it. researchers don't utilize the details of back prop in there day to day work, so expecting an off the cuff derivation isn't a fair assessment of what makes a good researcher
Researchers specialize and as such even in a "single field" they work at very different levels of abstraction; one subfield will care about building up some novel construction from first principles and then another subfield will want to use that construction as a basic axiomatic building block and abstracting from the details. E.g. execution performance optimization of a known formula is orthogonal to developing a better formula, we want people working on both these aspects, these are going to be different people who each build on their own subfields first principles that don't overlap much with the first principles of the other researcher.
In my experience analytical results have very little impact on practical deep learning. Examples:
- Everyone “knows” that the Adam optimizer’s proof is incorrect, but we still use Adam because we don’t want to redo hyperparameter search with a different optimizer that’s proven to converge but probably performs worse.
- Everyone “knows” that the Wasserstein loss for GANs has a better convergence proof, but nobody uses it because the generated images look like crap compared to what you get from stylegan* with their default config.
It’d be nice if ML proofs led to better performance, but that’s not often the case. I see far more progress from better data preprocessing and from bringing in knowledge from other fields like signal processing.
that seems like a very basic thing that would be on a quals exam. a good researcher is a combination of smart/creative with expert knowledge of their field down to the fundamentals.
as an ee phd can you derive basic control theory or electromagnetic relationships?
I still have yet to use graphical models (in the traditional sense, not including the new age variational inference style neural networks as graphical models) in real life. Am I just completely missing something? Where do people generally find compelling uses for graphical models?
I use graphical models all over the place, typically for problems that have more structure than simple statistics calculations, but don't need the huge capacity of machine learning models. For example I work with bio folks measuring "EC50" values, basically parameters of titration curves in a wet lab. It seems like a simple curve fitting problem with say 5 parameters. But then these wet lab scientists measure hundreds of curves at once, so we want to put hierarchical structure among the curves -- that is all the curves should look pretty similar with only a few degrees of freedom. Graphical models are a great framework for expressing prior knowledge about the dependencies between these curve parameters. But yeah I then do inference in graphical models using variational inference in PyTorch.
Is there a resource you would recommend for learning about applying GM and related techniques to data like those you described (similar structures with N DoF)? As a hobbyist I've been toying with symbolic regression and this feels like a wall I've been running up against
They are used quite a bit in computational genomics. Indeed, genomics is full of latent variable problems where one has a good model for the underlying phenomena but often not a lot of labeled data.
Though, there are other ways to derive it. My personal opinion is that the vector and matrix calculus derivations in the book are too verbose, but this style may be more comfortable for some readers. My personal opinion is that the semidefinite and cone optimization communities have more concise ways of deriving these kind of derivatives and relationships. For example, this can be seen in Boyd and Vandenberghe's Convex Optimization or Ben-Tal and Nemirovski's Lectures on Modern Convex Optimization.
I almost bounced off this link because none of the tabs have any content in them. I clicked through to the GitHub and it also doesn't tell you where to get started. Finally I came back and found the PDF. Maybe the title could be "Pen and paper exercises in machine learning (PDF)" to help you know what you're looking for?
Direct links to arxiv PDFs are discouraged, since they are immutable. If the author uploads a v2 to correct an error, your link will still point to the old version.
Even if it was a link for a specific version, I'd still post it as arxiv's landing page isn't designed for those who need to jump to the content right away. It's extremely cluttered and hard to navigate. I couldn't care less if the PDF got some arbitrary fixes in an unknown future. That'd be arxiv's problem.
I would say it depends a lot on your background. The whole thing is very detailed, but ideas can be lost in detail-oriented proofs.
Reading the first few sections, it seems that the ideas are there - especially in the proofs - plenty of motivating ideas, and the kind of "raw index crunching" that the paper begins with gives way to more ideas. Doubters might read section 1.6 about the power method for finding the largest eigenvalue. It convinced me that the ideas were worth reading.
It's so cool to see of why this works (as an engineer I learned about power method with handwaving explanation "it works in the limit" but I never knew why it works).
So what do we do if we want u_2, the eigenvector that corresponds to lambda_2 ?
Math overflow says we can just subtract the u_1 subspace from A [1] then repeat, but would that be numerically stable? (i.e. will that work with floats?)
I am at a loss to understand how these constitute _machine_ learning? The preface says "the exercises are ideally paired with computer exercises..." but I am at a loss to imagine what such computer exercises would look like. Somebody ELI(at least 10)?
These exercises are writing mathematical proofs that basic machine learning algorithms behave correctly. They are "pen and paper" not because you are manually solving a large equation that a machine would normally solve, but because we don't have automated theorem provers capable of proving interesting machine learning theorems. I would expect a typical 1st year grad student to be using a resource like this.
If you don't understand the purpose of proofs, then this resource is not aimed at you.
Your question is a fair one and does not deserve to be downvoted. I'm just curious from what sources you learned about machine learning? Even in this world of using libraries to get things done, I have yet to see machine learning books or courses not touching math at all.
>We may have all heard the saying “use it or lose it”. We experience it when we feel rusty in a foreign language or sports that we have not practised in a while. Practice is important to maintain skills but it is also key when learning new ones. This is a reason why many textbooks and courses feature exercises. However, the solutions to the exercises feel often overly brief, or are sometimes not available at all. Rather than an opportunity to practice the new skills, the exercises then become a source of frustration and are ignored.
Typical exercise soltion: How to draw an owl. 1. Draw some circles. 2. Draw the rest of the $@#% owl.