Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Intel's whitepaper claimed that a typical user would encounter a problem once every 27,000 years, insignificant compared to other sources of error such as DRAM bit flips.

> However, IBM performed their own analysis,29 suggesting that the problem could hit customers every few days.

I bet these aren’t as far off as they seem. Intel seems to be considering a single user, while I suspect IBM is thinking in terms of support calls.

This is a problem I’ve had at work. When you process a 100 million requests a day the one in a billion problem is hitting you a few times a month. If it’s something a customer or worse a manager notices, they ignore the denominator and suspect you all of incompetence. Four times a month can translate into “all the time” in the manner humans bias their experiences. If you get two statistical clusters of three in a week someone will lose their shit.



No, IBM's estimate is for a single user. IBM figures that a typical spreadsheet user does 5000 divides per second when recalculating and does 15 minutes of recalculating a day. IBM also figures that the numbers people use are 90 times as likely to cause an error as Intel's uniformly-distributed numbers. The result is one user will have an error every 24 days.


That's also a clearly flawed analysis, because the numbers mostly don't change between re-computations of the spreadsheet cell values!

E.g.: Adding a row doesn't invalidate calculations for previous rows in typical spreadsheet usage. The bug is deterministic, so repeating successful calculations over and over with the same numbers won't ever trigger the bug.


Yes, the book "Inside Intel" makes the same argument about spreadsheets (p364). My opinion is that Intel's analysis is mostly objective, while IBM's analysis is kind of a scam.


IBM's result is correct if we interpret "one user experiences the problem every few days" as "one in a million users will experience the problem 5000 times a second, for 15 minutes every day they use the spreadsheet with certain values". It's an average that makes no sense.


Spreadsheets Georg....


Ah.

The other failure mode that occurred to me is that if a spread sheet is involved you could keep running the same calc on a bad input for months or even years when aggregating intermediate values over units of time. A problem that happens every time you run a calculation is very different from one that happens at random. Better in some ways and worse in others.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: