Failing 15% of the time is the best way to learn, say scientists

yorwba · on Nov 10, 2019

The paper is interesting because they calculate the theoretically optimal difficulty for a specific class of learning algorithms: https://dx.doi.org/10.1038/s41467-019-12552-4 (I think the method might be applicable for scheduling flashcards better than the rule-of-thumb spacing of Anki et al.)

The Independent article is unfounded speculation about this applying to the way humans learn, without any discussion of whether the model is actually applicable. (Most things humans are trying to learn aren't binary classification tasks.)

Mathnerd314 · on Nov 10, 2019

The paper has the Law & Gold model of (monkey) perceptual learning as one of their cases, and has a bit of discussion about how stochastic gradient descent in ML is quantitatively similar to observed characteristics of human learning. I don't think it's unfounded speculation, except perhaps on the part of the paper.

Mirioron · on Nov 10, 2019

While human learning might be similar, is it similar enough? Humans seem to require very few data points to learn something. Just a few examples are enough for a human to grasp a pattern. As far as I know, we haven't been able to do the same in ML.

sjy · on Nov 10, 2019

I’m not so sure about that. Adult humans can learn new things with a remarkably small number of additional data points, but they’re drawing on a lifetime of human experience which is also data. Children tend to need more help.

finebalance · on Nov 11, 2019

Right. We have priors about how the world works, and that usually helps.

conductr · on Nov 11, 2019

ML has training right?

friendlybus · on Nov 11, 2019

Humans have always thought or related the latest technology to their own functioning, particularly of the brain. It's amazing to see the process of people finding pistons and mechanical automatons as a metaphor for the mind. Then sequential programming and now ML. Humans have an incredible ability to restructure their own ideas about their brain with machines and adapt to new ideas, in a way that would break a machine. Not to mention the rest of human biology that contributes to our functioning minds outside of the brain.

I don't know how to separate out the difference between learning something through immersion and seeing the broader world view that nestles those ideas within them.

DoctorOetker · on Nov 11, 2019

I agree the paper is very interesting

Is there a mistake in equation 3? shouldn't it read:

ER = integral (-inf, 0, p(h | ABS( Delta ), sigma ) dh ?

for positive Delta I agree, but with negative delta either the integral should run from 0 to +inf, or the absolute value of Delta should be taken?

zadokshi · on Nov 11, 2019

Yes, Anki’s goal of “let’s quickly hide all the cards you know and only show you cards you are forgetting or on the edge of forgetting” has never seemed overly useful. The end result is the following algorithm: “let’s present you with the toughest things you can’t remember over and over again, and ignore all the facts you’ve done so well at learning (until you have almost, or completely, forgotten those facts)”

Throwing up cards you can answer in 1 second is not wasting anyone’s time. It is more likely to be encouraging to people to remind people of what they know.

jacques_chester · on Nov 11, 2019

> The end result is the following algorithm: “let’s present you with the toughest things you can’t remember over and over again, and ignore all the facts you’ve done so well at learning (until you have almost, or completely, forgotten those facts)”

In my understanding, that's closer to optimal. Effortful retrieval is much more effective at strengthening future retrieval.

echelon · on Nov 11, 2019

Is there an easy adjustment to the algorithm parameters to make things better?

Wanikani does an amazing job at SRS. I wish Anki followed its model.

rajlego · on Nov 11, 2019

I haven’t used either Anki not wanikani (I use SuperMemo) but from what I’ve heard wanikani has no leech management meaning you end up stuck repeating the kanji you keep failing, making the experience miserable. Leech management is really important for any decent SRS

grep_name · on Nov 11, 2019

Could you expand on this? I've used both but am not sure what you mean by leech management

echelon · on Nov 11, 2019

Sometimes you'll have much more difficulty with certain items than others. No matter what you do, even after repeated attempts to learn, you just can't get them to stick.

These are "leeches". They burn productive study and review time.

I have a theory that these items have lower adjacency to past experience or knowledge and it's difficult to form mnemonics or other connections. Or they're less novel and don't cause our brain to take interest. That's where all of my leeches lie -- in the realm of things I don't particularly care about.

A good leech management algorithm will back-burner unproductive items so you can focus on the rest of the concept population. There are different types of leeches too -- things you don't get during introduction, or things that you can commit to short term memory but won't stick for long. A good algorithm will identify all of them and block them.

grep_name · on Nov 11, 2019

That's really interesting, I'll have to give supermemo a try. I definitely remember those items in WaniKani; a lot of the times they were mnemonics based on pop culture references that I just didn't get, or the mnemonic was just kind of a stretch, or too many similar concepts had been introduced at once. When I stopped I had definitely hit a wall where I just didn't feel like I could keep learning.

zadokshi · on Nov 11, 2019

Interesting to see the downvote. I guess there are some avid Anki users here.

Leech management is ok, but I think active positive reinforcement is also necessary. Remind the user of how much they know as well.

geewee · on Nov 10, 2019

Unless this article is leaving out some major points, the whole thing seems flawed. So their machine-learning models learn best if they fail 15% of the time. Fair enough - but trying to discern anything meaningful about how often people should fail based on that, seems like quite a bit of a stretch

Dumblydorr · on Nov 10, 2019

Anecdotally, I'd say I fail most things 95% or more of the attempts. That's why we rewrite, debug, practice, drill, google, avoiding failure takes a lot of human effort.

infogulch · on Nov 10, 2019

Humans get way more information out of each failure than ML systems. When you or I fail just a few times we can analyse the failures and often discover huge classes of wrong behaviors, and never repeat any of them. We're also good at differentiating what parts of the failure caused it and can even learn what parts were successful. We might even test dozens of hypothesis at once in a single attempt, even if we're focusing on just one of them. A computer often only gets one bit of learning from a failure or a success: this single behavior in particular either did or did not work.

My hypothesis is that we model the system we're studying and simulate many 'attempts' for every real world attempt. I.e. we grow a low-fidelity, but much faster, model of the system in our brain that we can use to make medium-low confidence predictions about the real system many times for each time we test against the real system.

So when you say you fail 95% of the time, I'm saying each of those failures actually have 200 mini-successes embedded that you can still use to train your mental model.

mike_hock · on Nov 10, 2019

> and often discover huge classes of wrong behaviors, and never repeat any of them

Once burned, twice shy. And often that results in irrational aversion to huge classes of behaviors just because they appeared in the larger context of the failure of an endeavor as a whole, which I'd say is not a good way to learn from failures.

undergrowth54 · on Nov 10, 2019

Usually.

Sometimes people go into a situation confused, fail, and don't know how to interpret why they failed or what parts caused the failure. I think it is worth distinguishing that type of failure from the type you're talking about. Why? Often when you tell someone you don't know how to do something or you think you'll fail at something, they have your type of failure in mind and they encourage you to just try again.

pushrax · on Nov 10, 2019

It’s interesting to take this view and apply it to personal identity. A self seems to be a low fidelity model of one’s autonomous system.

It seems necessary to include a model of self to reach the kind of predictive ability necessary to learn from few examples.

jstanley · on Nov 10, 2019

It depends what "failure" even means. If you mis-place a semicolon and have to go back and fix it, have you "failed"? I'd say no. I'd say a project has only failed if it never gets to a working state.

newnewpdro · on Nov 10, 2019

I think failure is defined by expectations.

If you failed to succeed in the expected amount of time/effort, it's a failure. Maybe you add some tolerance of going past the estimate before classifying as a failure, but it's still rooted in the expectation.

For example, if it took you ten years to pass kindergarten, you've failed.

kiba · on Nov 10, 2019

Yeah, the research reported mention machine learning experiment, but they hadn't validated that it also applies to people as well.

omarhaneef · on Nov 10, 2019

On a binary classification task, it is a priori true that you would likely not learn if you were right 50% or 100% of the time. This is not a function of any particular learning algorithm.

If you got 100% right, you already know everything that was being tested.

If you got 50% right, you don't know if you are guessing or if you should be picking up on any features.

So you would expect that the rate would not be close. 50.1% would be similar to 50% for most intents and purposes. Similarly 99.99%.

So you might expect that the optimal learning rate would be close to 75%/25% in general. This would apply to humans too because it is a statement of the information you need to solve the problem, not a statement about the algo.

This paper finds it to be 85%/15% for a particular algorithm. Perhaps humans learn similarly, perhaps not. However, you might expect the optimal examples to be somewhere in the 65-85% range for any particular algorithm.

beefman · on Nov 10, 2019

The portion 15% seems to crop up suspiciously in optimization contexts... this was noted by Gell-Mann in The Quark and the Jaguar. It's roughly the portion of false warnings sounded by certain tropical birds to gain uncontested access to food. Gell-Mann speculates that it is close to 1/(2pi)...

jacques_chester · on Nov 10, 2019

> The portion 15% seems to crop up suspiciously in optimization contexts

That's approximately the vale of the area under one tail of a normal distribution, from one standard deviation above the mean to infinity.

I'm not statistically mature enough to say whether it's just coincidence. For one thing, oodles of natural phenomena in no way follow the normal distribution.

yorwba · on Nov 10, 2019

In this case, it is close to 1/2 * (1-erf(1/√2)). No relation to pi, as far as I can tell.

solicode · on Nov 10, 2019

Not too sure about the case in the article and if relates exactly, but it made me think about spaced-reptition systems. I aim for a 80-90% success rate as I've found that to be the optimal range (arrived at that after doing this for 10+ years now with varying settings).

I found other articles about this: https://eshapard.github.io/anki/target-an-80-90-percent-succ... https://vladsperspective.wordpress.com/2017/03/14/optimize-y...

alexeichemenda · on Nov 10, 2019

Interestingly, this correlates well with what is happening in the ad-tech world.

Specifically in performance marketing spend, 15% of the budget is very often allocated to "new initiatives & new partners", with the thought process that it'll either allow to find a previously un-identified improvement, or it'll allow to learn what to avoid in the future on the 85% of spend.

orasis · on Nov 10, 2019

Shouldn’t maximum information gain be found at maximal uncertainty: 50% ??

How is this different from information theory?

jacques_chester · on Nov 10, 2019

Learning in this case is really about recall -- ensuring that information already captured is successfully retrieved.

That's a different sense from learning as discovery, or at least, learning as search. In searching a graph of possible hypotheses, yes, it is a better rule to look for opportunities to halve the search space.

Enginerrrd · on Nov 10, 2019

I'd say no, because at that failure rate, there's nothing to be gained by learning over just guessing.

dmix · on Nov 10, 2019

Wasn't that "10,000 hrs of practice to master anything" paper discredited? This one sounds very similar in trying to quantify a very chaotic and qualitative process. The usefulness of such a stat on any one person is probably nil.

dragonwriter · on Nov 10, 2019

> Wasn't that "10,000 hrs of practice to master anything" paper discredited?

No, AFAIK it literally never existed. That was, as I recall, a popular misinterpretation of what was itself an unwarranted generalization made by Malcolm Gladwell based on a paper with much more limited scope and conclusions.

> This one sounds very similar in trying to quantify a very chaotic and qualitative process.

The actual direct conclusion—that this error rate is optimal for a variety of machine-learning processes—does not seem to ha r the problem you describe. The suggestion in the paper that this extends to “biologically plausible” neural networks that may model animal learning also does not seem problematic in the way you describe. The news article’s claim that this is a finding of a sweet spot for human learning is, while it is a possibility suggested by the paper, simply unwarranted as a conclusion.

It's certainly plausible that a quantifiable sweet spot of this type exists for some kinds of human learning at the optimization of effectiveness in a curriculum that can be dynamically scaled to individual learners could effectively be guided by it, but there is not a strong reason without actually testing in concrete human learning scenarios to believe that the particular number here is a guide to that.

anonytrary · on Nov 11, 2019

I have a data point/anecdote! I like to play certain sports. After about a year of intense focus and determination, you can get good at pretty much any sport. What I've noticed though is that on a good day, I'm messing up about 15% of the time. If I mess up significantly more than that, I get discouraged and want to go home and try again the next day. If I'm not messing up enough, I feel like I'm overfitting a particular technique and should probably be messing up more to become more well-rounded.

blankaccount · on Nov 11, 2019

Any tips for picking up soccer? ie. do you have a tim ferris style 80/20 split of what you would work on in the first year?

viig99 · on Nov 11, 2019

Bert has an 15% masking rate, seems co-related, also 90% is what works well when you are trying to do label smoothing using entropy minimisation, what's going on!

dr_dshiv · on Nov 10, 2019

As though getting 100% on my calculus quizzes indicated that I wasn't learning. Or, that I would learn more if I didn't study as hard and got 85% correct.

detaro · on Nov 10, 2019

Neither of those two examples are about what the article says.

dr_dshiv · on Nov 10, 2019

But they are presenting their Goldilocks theory as something to generalize, no?

detaro · on Nov 10, 2019

Your quiz example is not about choosing the difficulty of the quiz, which is the variable being considered.

> As though getting 100% on my calculus quizzes indicated that I wasn't learning.

It doesn't say that. It says you would probably be learning faster if the test would be more difficult.

Same category error with the second.

dr_dshiv · on Nov 10, 2019

"you get 100 per cent right all the time and there’s nothing left to learn."

yorwba · on Nov 10, 2019

The learning process considered in the paper is more like a pretest before each class, and the lesson just reveals the correct answers.

If you get 100% on every pretest, you're probably not learning anything new from being told the answers.

dr_dshiv · on Nov 11, 2019

In that case, I'd be surprised that optimally 85% is best, if it is testing me on stuff I hadn't learned yet. 85% seems till too easy, if it is testing material I haven't had a lesson on yet.

sova · on Nov 10, 2019

1/0.15 means that the optimal failure point is every 1/6 or 1/7 times. Agree?

mkl · on Nov 10, 2019

No, 1/7 < 0.15 < 1/6. Whether that's an optimal failure level for human learning is a different question that this research doesn't seem to really answer.

stjohnswarts · on Nov 10, 2019

I'm sure there's some way we can tie this to the golden ratio...

kd3 · on Nov 10, 2019

It is painful to admit, but you learn best from mistakes and failure.

atomicity · on Nov 10, 2019

Just because you learn more from failures doesn't mean that you will learn more in aggregate over time with more failures. If you fail too much, your brain will tell you that it's better to quit and spend your attention/time in another way.

kd3 · on Nov 15, 2019

Elon Musk recently said it well. He said something along the lines of always assuming you are wrong. The goal is to try to be less wrong all the time. Which is basically what good science is. And that guarantees long term improvement and success. So you need to fail a little to learn and get better. If you always succeeded, I think you'd never really know why you succeeded which is important knowledge and guarantees to a certain degree not making those mistakes again in the future.

julienreszka · on Nov 11, 2019

I've written an article about this. https://blog.julienreszka.com/2019/09/growing-success-by-lea...

The subject is so sensitive that I often get banned from forums for saying what I know to be true.

bluedino · on Nov 10, 2019

I always tell people with cooking, it's not always knowing what to do, but knowing what not to do, and when things go wrong.

If you don't fuck it up, people will think your food is at least 'good'. Making it 'great' is the hard part.

joker3 · on Nov 10, 2019

In mice.

Well, you get the idea.

kyshoc · on Nov 10, 2019

@justsaysinmice[0] would like to have a word...

[0]: https://twitter.com/justsaysinmice

huherto · on Nov 10, 2019

or "en ratones" if the language of the article is Spanish.

stjohnswarts · on Nov 10, 2019

Dude I wish it was only 15% of the time :)

cagenut · on Nov 10, 2019

see mom those F's on my report card show I was learning optimally

sova · on Nov 11, 2019

Grades are a bit backwards, because learning actually happens when we 1) make an attempt 2) see the result 3) learn from/reflect on the result. Typically school teaches us to do numbers 1 and 2, with very little emphasis on 3

Merrill · on Nov 10, 2019

So B students are learning fastest, and A students are just idling.

OTOH, I graded for a Physics prof where 50/100 on his test was about average.

kiba · on Nov 10, 2019

B and A is just a proxy for effectiveness in scoring.

However, you do want to be a "B" student when learning, just not in scoring.

jfk13 · on Nov 10, 2019

This rings so true to me! At school (high school, for you Americans -- I don't mean university), I was an "A" student without really trying. Unfortunately, this meant that I didn't learn to try.

Eventually (at a prestigious university) I found I could no longer "coast", and studying required real work. It wasn't easy to come to grips with that reality, and I wish I'd learned earlier.

Merrill · on Nov 10, 2019

A's are in part due to attention to detail - reading the instructions very carefully, interpreting the questions just right, setting up problems as intended by the test writer,...