What am I missing here? Unless all the devs were really bad at maths (unlikely i...

Strilanc · on Feb 10, 2023

I would speculate the main hurdle was probably believing the players in the first place. Humans are notoriously bad at not-noticing-patterns in properly random data. And statistical bugs like this require more effort and careful attention to detail to reproduce than deterministic bugs.

Another hurdle is likely that game developer culture strongly favors integration testing over unit testing. Games are optimized for fun, not correctness, and you can't unit test fun. This specific roulette selection function would have been straightforward to unit test, and a unit test would have caught the distortion. But now imagine people keep varying how important distance is to the calculation in order to make it "feel right". Updating those unit tests is suddenly a noticeable slowdown on how quickly you can iterate on game feel.

bombcar · on Feb 11, 2023

Yeah, WoW had many many problems with people not understanding probabilities that they added explicit code to track all drop rates and compare them to intended - and actually found a bug or two that way.

But mostly it was to explain to people that a 1 in 100 chance doesn’t mean you’ll get it even after 200 goes.

Normal_gaussian · on Feb 10, 2023

Systematic debugging flaws probably; and lack of tooling to easily isolate.

Systematic flaws: a cross between groupthink, early flawed assumptions, deference to team leads, a 'I just look for 1hr, if I can't find move on' (which leads to not looking), or just plain simple "reading" instead of searching.

Lack of tooling: many game engines are infamous for lack of control over tooling. I havent used many, but I understand it would be quite an effort to run meaningful parameterised or structured fuzz testing on most systems. This makes it hard to artificially confirm suggestions. That said, there is practically no excuse for them not to just add a bunch of counters to the game - even on their internal testers it would very quickly become clear there was a bias.

Most of my 'should have caught it earlier bugs' are of the 'deference to lead' variety. I looked, didn't see immediately, handed off with some notes, and then the follow up debugger(s) took notes or thoughts as gospel. This is really hard to fight - I write something along the lines of "my hunch is there is a problem in code x because it handles y and is poorly structured/tested. I checked z and found i, j - queries as follows" and then find the debuggers effevtively refuse to look anywhere past x. This is particularly true for a group of debuggers, who play chinese whispers with groupthink and invent reasons it must be x.

mabbo · on Feb 10, 2023

It's game code that was written in the late 90s. Unit tests aren't very likely. And a quick glance at the code probably looked completely sane.

kkoncevicius · on Feb 10, 2023

Yes I came to post exactly this, and found your comment, so will reply instead. The bug doesn't seem to be obscure. It is there in the right place. Someone thinking about checking "why some players are attacked more often?" would probably choose this as the first place to double check, since it is directly related to selecting the player for attack.

Maybe the most occult part of this is figuring out that the unique IDs assigned to the players play a role.

thaumasiotes · on Feb 11, 2023

> The bug doesn't seem to be obscure. It is there in the right place.

Well, there is a larger bug -- the entire algorithm, functioning properly, still won't behave the way this letter says that it should. It's not clear what it's designed to do, but it's very obvious that it doesn't do what the description says it does.

Did nobody ever notice that?

hinkley · on Feb 10, 2023

I'm missing something here and it has nothing to do with math.

Why couldn't they reproduce the problem in testing? For what was it, years?

voakbasda · on Feb 10, 2023

Likely because they tested with characters that were not affected. Sometimes reproducing bugs takes good luck more than skill.

remram · on Feb 11, 2023

All characters would have been affected. If they had picked any 3 characters, and put them in a room with monsters repeatedly, they would have observed that the same character was attacked most times.