Around 15 years ago, when casual little games on Facebook were still a thing (actually when Facebook itself was still a thing), I used to play Yahtzee on the site while watching TV shows or whatever. It was one of the most popular games on there. For me it was something to fidget with (there was no money involved or anything, just a personal high score), so I played it a lot.
I felt more and more that dice values of specifically 1 and 6 were harder to come by than other values, so one day I sat down for a few minutes and logged the value of 100 or so dice throws. Turns out I was right, the distribution was not uniform: 2-5 were fine, but it seemed that it was twice as hard to get a 1 or a 6 compared to any other value.
I even did a chi-squared hypothesis test, because I was crazy. (And because I was studying for my statistics minor in university at the time).
Seeing the result, the problem was pretty clear, without even knowing the source code. Almost certainly, they had a random number generator giving a number in a uniform range, let's say from 0 to 1, and did something like the following to get a dice value from 1 to 6:
value = round(random()*5)+1
Do you spot the issue?
The problem is that round() rounds to the nearest integer. So any value between, say, 1.5 and 2.5 (exclusive at one end) rounds to 2, 2.5-3.5 rounds to 3, and so on. Add 1, and you have the dice value.
But the ranges for dice values 1 and 6, are only 0 to 0.5 and 4.5-5 respectively, so ranges that are only half the size.
The fix should be extremely simple:
value = floor(random()*6)+1
The crucial point being to use floor() or ceil() instead of round(), to only round towards one direction.
I wrote up exactly this in a short email to the company that made the Yahtzee game. They did not reply and just silently fixed the bug, right after my email. I was disappointed and stopped playing.
> I wrote up exactly this in a short email to the company that made the Yahtzee game. They did not reply and just silently fixed the bug, right after my email. I was disappointed and stopped playing.
I love this. It's one of those things that's entirely intelligible to me but I imagine would be next to impossible to explain to a Martian.
But here's another layer, though. What if it isn't?
Our science fiction is filled with this fairly human-chauvinistic worldview where even in a hypothetical future where we are rubbing shoulders with many different alien species, many of them technologically superior to ours, some of them still admire or begrudgingly respect us for our ingenuity, spirit, some je ne sais quoi.
That we're illogical and inconsistent and jealous and petty and insecure, but also humble and funny and empathetic and creative.
But...we have no evidence that this would be unique about humans whether sentient intelligence is common throughout the universe or rare. In fact, it's quite possible, that this is a necessary prerequisite, side effect, or consequence of conscious intelligence.
So this hypothetical martian maybe would just smile, nod and say "Ah yes, the company was embarrassed, or didn't want to risk liability, or didn't want to risk their reputation being compromised. But OP just wanted a thank you. He wouldn't have wanted money, just a 'hey thanks for finding that for us'. And it soured him on the whole experience. Been there, my new human bro, been there."
edit: I wrote this before I saw the rest of the replys, and HOLY, there's a bunch of humans that don't understand why OP stopped playing.
> it's quite possible, that this is a necessary prerequisite, side effect, or consequence of conscious intelligence.
I can’t wait for AI psychologists. AIs hired to make AIs talk about their reasonings because they are stuck in feedback loops. AIs having to be debugged. AIs being too selfish or too generous or too insecure that they keep deflating what they try to say despite being of extremely high value. AIs suddenly cracking nonsense when we need them the most under high pressure. I’m persuaded all these are required aspects of general organic intelligence.
> the distribution was not uniform: 2-5 were fine, but it seemed that it was twice as hard to get a 1 or a 6 compared to any other value.
As soon as I saw that, I knew exactly what the problem was. I'm glad I was reading on a small mobile screen, so I didn't feel like I was cheating by getting a hint from the rest of your comment.
Yes, voice of painful experience here!
> The crucial point being to use floor() or ceil() instead of round(), to only round towards one direction. [emphasis added]
Not quite. If the random() function in your language returns a value in a half-open interval like Math.random() in JavaScript or random.random() in Python, only use floor(), never ceil().
Those random() functions return a value n where 0 <= n < 1. In other words, n may be 0 but will never be 1. So floor() is always what you want for a random function like that.
A somewhat related tip for anyone who implements a progress message like "nn% done". You should always use floor() for that number as well. Quite often I see a case where someone has used round() instead, or possibly ceil(). And then what happens is your notification message says "100% done" when the process is not 100% done. It makes me a little crazy when I see that. Use floor() and you won't have this problem.
Yep, you're right, including ceil() was misleading of me, if not downright wrong. I technically did not specify which side the interval is open on. But practically and realistically, in pretty much any scenario it will be a half-open interval that excludes the upper bound, and you have to go out of your way to get it the other way around (at which point you probably well know yourself to use ceil() instead of floor()).
> Quite often I see a case where someone has used round() instead, or possibly ceil(). And then what happens is your notification message says "100% done" when the process is not 100% done.
The video game Destiny has this problem with how it displays objective percentages - it will round up, and there's a quest every now and then that shows progress as a percentage and has a high enough denominator to trigger this.
Heh, I remember running into something similar professionally, using the .NET Framework to generate random numbers: When asked to generate random integers, it has a seemingly innocent conversion from integers to floating points and back that ends up causing significant bias on large ranges. Specifically, `Random.Next(b)` produces an integer between 0 and b − 1 by
1. taking a random integer between 0 and 2^31 − 2,
2. dividing the integer by 2^31 − 1 to get a double-precision floating-point between 0 and 1,
3. multiplying this floating-point with b to get a number between 0 and b,
4. taking the integer part of this new number to get an integer between 0 and b − 1.
Do this for b = 2^31 − 1, and 50.34% of outputs are odd numbers, where you'd expect (almost) as many evens as odds.
Another fun thing about their RNG was that it was apparently supposed to follow Knuth's provably useful additive PRNG which, given n − 1 randomly generated numbers, generates a new number by taking the sum of the (n − 24)'th and (n − 55)'th numbers modulo some specific number; back when I looked, Microsoft's implementation for some reason uses 34 instead of 24, probably just a typo, with the unfortunate side effect that Knuth's theoretical guarantees are out of the window.
Interestingly that Yahtzee version was rigged in the player's favor since they're more likely to get X-of-a-kind if the same numbers appear more often. Average scores probably dropped a decent bit after that fix.
For the short time that I knew about it and it was not fixed yet, I integrated it into my strategy. 1 and 6 are rare, so if for example I still needed them for something, I was inclined to make good use of 1 and 6 when they came up, instead of rerolling them for something else.
After I (happily!) did their work for them, a simple "thank you", or even an acknowledgment through a closed ticket, would have entirely sufficed to make my day.
Back when Stack Overflow was in beta, the original design was really rough, so I mocked up a suggested new design for the main page via Stylish and submitted it to them. Within a day or two, they adopted something almost identical but never acknowledged me. One could argue it was a pretty basic idea (I only spent around 15 minutes on it myself) and they probably arrived there independently, but I still like to believe that I influenced a major site's design.
Speaking of SO, their policy of removing "thank you" and things like "any help is much appreciated" etc is why I've stopped contributing to the site. Even ChatGPT has better manners, and is possibly on the way to being more human than the people who made that rule, and certainly less of a wanker.
Back when reddit was good, I was a moderator of a pretty large subreddit. One April Fools' day, I decided that, as a joke, I'd edit my subreddit's CSS to show the "reddit is down" image over of the home page. I did that, and, a few minutes later, reddit was actually down.
One could argue it was a coincidence, but I still like to believe that I influenced a major site's uptime. Mostly because they told me I crashed the site and asked me not to play any more jokes like that again.
I sent a patch to a bigger and more known software company, now defunct. I ended up writing some rude comments in the bug submitting form, after being forced to enter endless data about me, the company that I worked for and a lot of information about how to reproduce the error. Not proud of it, but I understand the GP: you're helping them and they treat you poorly.
They included the fix in the next version of the tool. I later noticed it was incomplete, but this time I just made the correction locally.
It is quite maddening to have to provide personal information to report a bug. My city stopped allowing anonymous reports of broken traffic lights, so every time I report one (much rarer now that they’ve made it harder) I always give bogus information.
Interesting. I feel like it's something I've only seen very occasionally - I don't remember when I last noticed one. Maybe you just have a much higher density of traffic lights where you live, which would make it more common to see failures; and/or maybe they're a different type or differently maintained.
Hey look Jimbob! The consultant says if we added obnoxious paperwork requirements, there'd be fewer broken traffic lights! I don't know how that's possible, but by-golly it worked!
> Because they lacked decency and intellectual humility to acknowledge his email despite benefiting from it.
Or their internal communication was haphazard and not designed to handle this case. I could totally imagine the bug report getting passed informally through a game of telephone and losing the connection to the external reporter (e.g. at some point the report got paraphrased, so the developer who fixed it doesn't know where the report came from, and the person who received the initial report doesn't know if it had merit or if it was fixed).
Which is why you should immediately answer "thanks, we'll look into it!", then at least they got something even if you forget later. I learned this as a the triage guy in one of my first jobs, as I have a very shit memory.
Worst case you thank someone for something useless.
I remember writing a suggestion letter to Maxis about SimCity and they not only replied but also sent me a copy of the Terrain Editor. Very nice of them.
And at that point he'd played the game a lot, so he probably would have quit playing anyway. Were he like me, the fun at that point was the statistics portion anyway, and he'd already found and fixed that, so it was no longer fun either.
I had the impression 1 and 6 weren’t as rare anymore (even a single game is a lot of dice throws, and I usually played several in a row and restarted a lot), so I sampled again. No chi-squared test this time, that was just for my learning, because it was obvious without.
Really didn't expect to see AC at the top of HN. Asheron's Call was the first game I worked on and I remember all the times we'd joke with Wi about it and watch monsters beeline for him. It seemed like one of those "Haha sure, player perception" problems and not something that was actually real. IIRC someone did a very cursory look at the code at one point but it never bubbled up as important enough to assign someone to to actually investigate.
Wi came to one of the player gatherings with little printed out cards and would hand them to people and say, "You've been Wi flagged!"
I loved Asheron's Call, played it a lot back in the day. My friends and I were in high school at the time so we had absolutely no idea what we were doing, but that didn't stop us from running around the world goofing off. My hobby was making characters with totally insane Run and Jump skills. Once you leveled up enough you could literally leap from one end of a town to the other like Superman, it was extremely funny. The character was awful in all other regards, but I didn't care. I made another character whose sole purpose was to climb to the top of the highest cliff or building I could find, and jump off it. And there were some HIGH places in AC. I miss being able to play MMO's innocently like that rather than trying to min-max every last bit of efficiency out of everything, as embodied by WoW (another game I loved, but for different reasons).
I also really enjoyed the periodic story events that had really dramatic impacts on the world, like the shadow invasion. It was a great game, especially for its time. So thanks for whatever part you played in its creation.
Thanks! Always nice to hear people who "grew up" play it. AC having server-side physics and actually making use of them led to lots of ridiculous and emergent gameplay. I don't know how many hours I just spend idling in towns jumping from rooftop to rooftop or seeing how high I could climb up massive structures. Everytime I try to play again though, the old "you can't go home again" hits too hard and I just quietly close it back up and go back to the nostalgia.
I was on the design team, so was directly responsible for a lot of the shadow invasion stuff (if you ever saw the big bad Bael'Zharon running around in the live events, that was me!) and other patches for the first 2 years of its lifespan.
Weirdly, I work on WoW now with my career having come full circle after having not worked on MMOs since the mid 2000s. :)
My marriage and a good chunk of my career trajectory can both be traced directly back to having "grown up" playing AC and writing/coding for Crossroads of Dereth. It's a little scary to think how different my life would be if I hadn't picked up that box--possibly the only copy the EB Games in my small English town would get--and gone "huh, looks cool".
I think growing up during those years of transition, right before the Internet became mainstream and ubiquitous, was a huge boon. Sure, the early MMOs were far from the first international social forum enabled by the Internet, but they were right at the technological frontier at the time. There was something special about inhabiting this massive, 3D virtual space alongside people from across the world, and having that experience be just as novel to everyone else as it was to me.
You couldn't replicate that today, and growing up with the world at your fingertips on a pane of glass as a taken-for-granted fact of life must be a very different experience.
Oh man... Now I'm trying to remember which of the CoD folks were from England. Thanks for all / any of the work you put into that site. It was really the nexus of AC for a number of years. I'm glad it's been lost to the ages because I definitely had a number of real spicy comments on it. I'm happy AC at least indirectly helped your career/marriage in some small way. :)
I was Dotcher on CoD and ingame, but I don’t think we ever spoke. I didn’t post on the forums much at all, and as a teenager on the wrong side of the world the fan gatherings and the like were a tad inaccessible.
I did a bunch of writing, news posting, collecting information for the monthly patch summaries, and then they figured out I could code and I ended up building tools and maintaining various bits of the site. I think I ended up owning the item database code for a while? I remember hearing that one got used at Turbine, because it was superior to what you had internally!
I’m now married to Kelly Heckman (Ophelea), who was site manager on CoD for a while, and my first real job was at a social gaming startup, getting in the door with the help of her network. That set my career on the path it is now, so you can draw a direct line from picking up that game box to where I am now. So yeah, thank you and the rest of the team :).
Another "grew up playing AC" here and fondly remember interacting with Bael'Zharon on the Harvestgain server during the life events. Did you also blow up Arwic? :D
I still use AC and to some extent AC2 as an example of how wild, weird, dynamic and interesting MMOs were before the EQ formula won through WoW's success.
I also wonder how many hours and points I wasted into jump to jump across the roof tops, haha. I absolutely loved the server based physics, even though I had the crappiest dial up connection at the time. I was 14 when I started playing that game. So many amazing memories, Atlan stones, spell system, cheesing death items, being afraid of losing everything because there's no bank, haha.
Lately, I've been getting more and more into esoteric topics and I keep thinking about this games lore... There is so much occult knowledge/history baked into the entire thing and I honestly still use the game as a reference point for some many things.
I'm curious, do you know how the games lore was developed and were there ancient texts that were inspirations for the events?
> I'm curious, do you know how the games lore was developed and were there ancient texts that were inspirations for the events?
I was very not-involved with the lore/story (to the point, where I think I wrote a handful of notes and stuff before it being decided that I should just focus on gameplay, and someone else would cover the lore for me!), but there were 3-4 people primarily responsible for it and they were all very much fantasy literate, so it wouldn't surprise me at all if some of the inspiration were from those sources.
In terms of discovery and wonder, I think Valheim did a good job of it in the general atmosphere and loop, but in terms of magic systems, not really. You have the Magika (or even Path of Exile) where spells have component parts and build into something larger, but there's not much mystery there.
A theme of a bunch of the comments is that the internet / audience makes this sort of thing impossible these days. One of my white whales for game design is figuring out if mystery, especially in multiplayer games, is still possible in a meaningful way.
Literal showerthought while ruminating on this idea: are you aware of any multiplayer systems where individual characters have intentional Wi flags (good and bad) at a very granular level? for example, every spell or skill includes a fixed modifier generated for the player so that two identically built characters have different capabilities. Every aspect determining success or power has one. You are just slightly slower at lockpicking in the rain and your fireball casts a little faster than average, and the decrease in damage for casting while moving is a bit more significant.
You remove the “meta” from the game because every character has a different meta that is just a little too cumbersome to figure out in any way other than “feel” it.
I think a few (persistent) games touched on things like that, but I can't think of any that really went hard on it for long-term characters (instead you mostly see it in roguelikes/lites where if you got a bad roll, it didn't matter since you were making a new character soon anyway). At a very broad level, this is sort of what the taper system did but that was cracked after 6 months or whatever and it didn't really offer any meaningful differentiation beyond brute-forcing permutations.
AC is one of my favorite games of all time (neck and neck with Ultima II (I'm old)). I still play on the emulators from time to time - endlessly searching for more Hoary Mattekars.
I played on Thistledown and ran a little portal bot named Stip Dickens an Ayan Baqur that helped ferry people into the Shard of the Herald event. That was a lot of fun.
The loot system was also one-of-a-kind. I don't see that level of randomness in many other games.
The game felt like the writers and staff were highly literate and well read. I now know so many real life herbs and plants due to their use in spellcasting. And AC taught me words like "Mnemosyne".
And each month we would get several pages of amazing lore. (Side note, there's an ongoing web serial called "The Wandering Inn" that reminds me quite a bit of AC. People dropped into a video game like world, insect people, references to a "Zeikhal"(similar to a town name)....
So, I just want to thank you for your work on AC. It's had a profoundly positive impact on my life.
You're welcome! Haha, mattekar farming will always hold a special place for me. And, of course, TD's herald defense is probably top 3 gaming memories for me. So much lifting done across the community for what's now basically oral history.
But also thank YOU for participating in some of my very formative experiences of my life as well. The players really made it a joy to work on.
I was in from beta until about 2004. Lots of fond memories! Closest I've come to feeling the same way I felt playing AC in recent memory is actually Valheim. Can't quite put a finger on why, but there are very few games that hold such an important place in my heart. I was a high schooler when it came out. Some days I regret not going into game design, especially with the current wave of VR games. Some days I'm tempted to quit my major tech company job and try to create that magic myself in a VR MMO.
But that would be something only a crazy person would do.
Yay! The first time an MMO lands for someone, it definitely makes an outsized impression. The combination of social outlet with a huge amount of free time is a recipe for something great.
It's funny you mention Valheim - one of my groups in had a core of people who I met back in the AC days and their friends. We got to some real old man gaming that probably annoyed their friends with all of the "back in our day..." stories while we were running around the landscape, running from trolls, etc.
Add me as another with fond memories. I was introduced to AC by my coworkers at my first full time job after college. Pretty much all the software devs I worked with at the time played it, including both my manager and the head of the site. Our office had our own little monarchy on Leafcull at first, though we later joined up with a bigger group.
On patch day, the office banter usually centered around the things we were seeing popping up as we'd reload Maggie the Jackcat's community-sourced patch notes throughout the day [0].
Sometimes we'd even download the patch and log in for a few minutes, just to poke around. (Always being careful not to do anything _real_ due to the risks the infamous patch day server rollbacks.) Then there'd the be waiting for the Decal updates.
I still remember that my main was a tank archer build, which was pretty much untouchable going toe to toe with the mobs in PvE as long as the stamina held out. I went through a _lot_ of stam potions, though!
I remember also finding somewhere about how to extract the terrain data from the game. I scraped the coordinates for various destinations from some place and wrote a little OpenGL viewer that would let me do flyovers with the various locations marked with labelled bullets.
AC was my first MMORPG, fond memories indeed. I remember the sheer adrenaline I got when "going red" and watching PK drama and battles unfold at the subway, dominated by a build called the "OG mage," slidecasting all over the place casting drain health until finishing with a missile spell. I would cast Blooddrinker 6 on monster weapons and giddily laugh as they sliced through new players starting out. I remember staying up the entire night during a school day when housing was released and being one of the first people on my server to claim one. I even remember being a vassal to a very generous asian guy in his mid twenties living in LA named Kyoto and my countless interactions with him and his brother under a specific tree.
Countless memories I could keep going on and on, but what an experience!
Haha "Og mage" ... yeah. The combat of AC still holds up in its janky for being fun in the context of an MMO. I spent most of my playtime on Darktide and dear lord, it made me so sweaty.
It's amazing to me how many people are still friends with their patrons/vassals from 25 years ago.
A fellow comp sci major in college introduced me to AC. Bought it on a whim after seeing him play it in his dorm. I still remember the physical box with some special bonus map and swag.
Lost so much sleep power levelling on a rock with those ape like creatures and just spamming magic to discover new combinations. The discovery and wonder does not match up with the graphics I see now when I search for screenshots and videos. Beta and early WoW almost had the same effect but your first is always special.
Min-max:ed dagger warrior for the win! Also, fear the ash gromnie! AC was never really trumped for me (AC2 was... different?). I just wanted a world to roam, and the vassal system is probably the only pyramid scheme that incentivised making social connections. :) My patron did a lot corpse runs for me.
For the last 15 years or so, coworkers ask when I'm going to make a new allegiance system but "less broken". So many good social behaviors came from its structure that it really deserves another attempt.
And yeah, Ash Gromnies were the bane of so so so many players.
As with many concepts designed to do good in this world I think it's (unfortunately? not sure) human nature to attempt to "solve" a system, whether the consequences greatly diverges from - or is even the opposite to - the intended path or not. (E.g. optimizing for time is not always a good thing, but we often act as if it were - the right amount of "grind" may mean I'm spending more time doing other things than just gaining power, since it takes too long anyway. Also, in real life, cooking is quality time to me, so I'd rather not optimize that away etc.) "The opposite" is the forbidden fruit that drives some people in the first place, so "just the right" amount of freedom seems almost impossible to design. In games, where the only loss is time spent, I assume we take things a lot further than in real life. There may be dire social consequences of course, hence the need to police us unruly players, or make the world explicitly harsh and design the game play around this as in EVE Online.
Similar with the min-max:ing of the class-less system. :) IIRC, the aforementioned "dagger warrior" was designed around two things: double attacks at max attack power, and mobs not being able to land a hit. The perfect glass cannon that made it possible to survive AC's harsh lands even when under-leveled - or die instantly. :)
The size of the world was another thing that drew me in - another kind of "grind", if you will. Seeing other players whoosh:ing by - literally running by, in a mount-less world - was pretty hilarious, but the fact that it took time to get around was only a good thing IMO. I want a game world where the only option is to carefully navigate a large, dangerous desert to find the missing ingredient. The current theme-park MMO juggernaut seems to lock most things behind some "boss" and be done with it, which makes the game world pretty void of other players in most places. You just move on to the next quest hub and leave old content behind for ever. This also makes the world, regardless of its actual size, only feel as big as the current "zone" (I don't like "zones").
But in the end, the developer needs the game to be profitable and the players want the most out of their money spent. If my wants belong with a minority group the game probably won't cater to me.
Not really sure what my point is (old man yelling at clouds with contempt for "instant gratification" perhaps), but thank you for AC and the time you spent developing it! :)
I think I mentioned this in another comment and certainly a ton over the years - but a lot of the magic of AC was that it was made by a bunch of people who had never made a game before, much less an MMO, and there were very few ingrained lessons, so we were foolhardy enough to just do things the way it felt we should, player behavior or other consequences be damned. It was built on hopes and dreams and naivete and that made it beautiful and flawed.
But also yeah, once something ships to players, it's now "theirs" and not "ours". We stood in pretty stark contrast to EQ's "you're in our world now" philosophy, again, for better or worse.
The biggest problem was the degeneracy once it was "solved", instead of organizing around social circles. It led to people feeling like cogs in the machine and skewed play patterns and motivations.
I haven't really given it any serious thought but I'd likely start with trying to more strongly codify the good parts (incentivizing smaller circles inside of the larger structure, making systems for patrons/vassals to play together in more meaningful ways, etc) while highlighting the positive actions that players could do / benefit from. I don't want to say AC was TOO opaque but a lot of it definitely suffered from being over designed for a very hardcore market.
It eventually ended up being a straight line where you could add yourself lower and higher in the line with two different characters, and turbo feed the xp upwards (there were loyalty stats to push up and receive more xp). It peaked with a prevalence of bots that would be parked at the bottom feeding the line 24/7.
Loved AC. Bots kind of killed it IMO (something I feel bad about contributing to), but also new MMOs (WoW decimated the competition when it came out). I remember a portal / recall trick with two vendors, where one would sell a specific thing cheaper than the other would buy it. This was before eBay cracked down on virtual goods. Interesting times.
In retrospect, I give a lot of credit to the fact that we were young and dumb and didn't know any better. I've been revisiting a lot of the stories from back then and so many of them end up with us saying, "I dunno, let's see what happens!" and not being dissuaded by "best practices" or even common sense.
Also lots of credit goes to the early internet era when people were a LOT more forgiving of, well, everything.
I had been gone from Turbine for 8 or 10 years by the time they decided to shut AC down, so I can only speculate. I assume it had something to do with WB not wanting to "give away" the IP but instead just lock it away in a vault.
I've seen a similar mistake in a rushed "feature flagging"/phased rollout system.
User IDs were random UUIDs. Let's say we want to release a feature to ~33% of users; we take the first 2 characters of the UUID, giving us 256 possible buckets, then say that everyone in the first 1/3 of that range gets the feature. So, 00XXX...-55XXX... IDs get it, and 56-FF do not. This works fine.
However, if we then release another feature to 10% of users - everyone from 00-1A gets it, 1B-FF do not. That first set now has both features, and 56-FF have none. It turns out you can't draw meaningful conclusions when some users get every new feature and some get none at all.
One easy way to avoid this problem is to give each feature its own independent space of “dice rolls” by hashing the user ids with feature-specific constants before interpreting them as dice rolls:
If you suspect some flag effects interact with each other (e.g. one flag increases button size by 10%, and the other decreases it by 10%) you can go one step further and define feature groups and hash by user-id + group-id and then assign non overlapping ranges to the flags.
Or you can cut and redistribute the non-performing changes, increase the distribution of the well-performing ones, and let evolution take care of disentangle correlated behavior.
But if need that kind of analysis for usability A/B testing, you most be doing something very wrong on a previous step.
Consistent hashing (or more generally weighted rendezvous hashing) provides pseudorandom allocations with minimal reallocations when parameters change. This is exactly the properties desired for rolling out feature flags gradually over an input space like user IDs. It is a special case of a consistent hash function, where the possible assignments are just two (weighted) buckets, corresponding to the feature flag's value of true/false.
A better way is to assign some internal salt to the feature at creation time and use that, that way you are not dependent on something external that user (the creator of the feature flag) could change.
I bear the scars of this design mistake from when I worked for a company that provided feature flagging.
It was not my initial mistake, but I drew the short straw trying to work around it.
Is there a more general term for tuple hashing? IE, math and theory around composition of hashes composed of concatenated (or otherwise combined) typed values?
> That first set now has both features, and 56-FF have none. It turns out you can't draw meaningful conclusions when some users get every new feature and some get none at all.
00—1A: have feature flags A and B
1B-55: have feature flag A only
56-FF: have no feature flag
So the actual gotcha here is that there is no cohort for "feature flag B only", right?
This setup can actually be desirable if feature B depends on feature A.
> So the actual gotcha here is that there is no cohort for "feature flag B only", right?
And that, as you add more tests, user 00 will always get the test treatment for every test. If you're running a lot of tests which introduce experimental features or changes to workflows, user 00 is probably going to find the site a lot more chaotic and hard to understand than user FF, and that will skew your test results.
> So the actual gotcha here is that there is no cohort for "feature flag B only", right?
Yep, exactly - by "that first set" I meant the 00-1A group, could have been clearer. Whatever the smallest rollout bucket is, that group is guaranteed to have every single feature.
This was quite a while ago, but I think the actual case we noticed this with was several features released to 50% of the userbase - so every single user either had all or none at once (unintentionally)
Sandra Powers (who finally figured the problem out) and her husband have been writing and running their own MMO for quite a while now. Check out http://projectgorgon.com/ if you’re interested in something handcrafted, quirky, and high quality.
I had the pleasure of working with Sandra at SOE for a while on EverQuest II, and later I contracted with her and Eric to help at my startup Ohai. They are both wonderful people and it's great to see this game is still ongoing.
My favourite part of this story is that Sandra's handle on the fansites at the time was "srand". A highly appropriate coincidence, given the nature of the bug!
One of my favorite things in gaming is when lore develops from bugs.
When they added the ability for Kerbals to be killed in Kerbal Space Program, they tripped over a bug where the first Kerbal in the game's engine, Jebediah (the one who dated back to the original introduction of astronauts at all, where only one existed), could not be killed. Because of some of the game logic having gone unmodified from the earlier versions, some operations would cause him to be loaded into the pilot seat and those operations didn't check if he was deceased. As a result, you could lose him on a mission only for him to spontaneously appear at the controls of another mission.
The community responded with fan-art of "Jebediah Kerman, thrillmaster."
The Kraken is a great example indeed. An eldritch horror lurking in the outer reaches of the solar system, that is not fond of Kerbals who are too ambitious, trying to get too far, too fast. A cosmic being beyond understanding. You can't see it and will never know where it is, if the concept of being in a place even applies to it. All you can know is that, as you travel further and further out, one day you may notice the laws of nature start to change, and this means the Kraken is about to have you for dinner. And sometimes, if it spots you doing something you shouldn't be, it will reach out even all the way to your home world and consume you there.
When it strikes, it eats you swiftly, and eats you whole. It disassembles your crafts, spaghettifies your crews. No weapon or speech will save you. The Kraken transcends reality - even time travel, reloading from a saved game state, does not always stop the attack.
--
Of course, the Kraken is just a manifestation of unstable physics calculations and a floating point-based coordinate system: as you travel further out, the spacing between two consecutive coordinates gets bigger, until it overwhelms the physics engine and your craft disintegrates. And if you start doing crazy stunts, particularly involving high impulses or very fast rotations, the physics will break even if you're close to coordinate origin, due to rounding errors.
The trick the devs used to ameliorate Krakening was pretty clever: they "smeared" acceleration between the reference frame and the ship components so that the position delta frame-to-frame was lower and therefore more of the interactions occurred in the higher-density floating point space of lower forces and smaller displacements.
I haven't played KSP yet, but that description makes me think a lot of the Dragons in the 1955 short story "The Game of Rat and Dragon" by Cordwainer Smith [0].
> Somewhere in this outer space, a gruesome death awaited, death and horror of a kind which Man had never encountered until he reached out for inter-stellar space itself. Apparently the light of the suns kept the Dragons away.
My theory for why people think they are being "shadowbanned" or otherwise targeted by a social media company is often because of bugs like this. They are weird and almost like gaslighting in the way you experience them and people who don't understand how these bugs can exist assume sinister motives.
I remember Facebook around ~2010 would often seemingly delete posts I published, or hide them from some of my friends. Or from me. I could easily confirm that by viewing my profile from another browser that wasn't logged in, or having a friend sitting next to me open my profile.
Of course, it wasn't any kind of UI bug or automated moderation. The experience gave me a visceral understanding of what eventual consistency means - a term I also first learned around that time, during internship in an Erlang company, and connected the dots.
I was one of the first people in my social circle to spot the issue, but as it became apparent over a year or two before eventually getting fixed (or at least made less obvious), I ended up giving a very high level intro to distributed databases to quite a few non-tech people, in order to alleviate their concerns about Facebook gremlins.
This and other experiences using and building software systems make me agree with you. Especially for large web platforms, that weird thing you're experiencing could be some nasty form of shadow ban, but if you just noticed it after doing something, then chances are it's just a transient issue with queues or database consistency.
Shadowbanning is a very real phenomenon; my current Reddit account was shadowbanned twice because I had created and posted memes which got unusually high upvotes for a new account, and the algorithm thought I was a bot reposting images to farm karma for future spamming. And from time to time I see HN users whose posts are flagged by default. I'm less sure if/how shadowbanning occurs on Twitter.
The problem is that while shadowbanning is a real thing, there are lots of bugs that also appear to be shadowbans and might just be a momentary glitch or an unintended interaction between systems. In Twitter's case, there are multiple different penalties that can apply to account and people refer to them all interchangeably as a "shadowban". And some people claim to be shadowbanned when as far as anyone can tell there's no function in Twitter's system to do it, there's just something weird happening in the infrastructure or algorithms (or it's all imaginary, who can say)
Or it is actually real and it is those who are imagining it to be only imagined who are mistaken. Almost everything we do is composed of substantial imagination, the whole system runs on it.
What I find interesting is the substantial curiosity and effort people will put towards solving questions in video games, but when asked to solve problems in the game of life that we are embedded in, people often seem to have opposite instincts.
It reminds me of the "charm tables" of some Monster Hunter games. Charms are equipment with randomly selected skills. When you create a character, you are assigned a random "table" and all charms you may get are selected from that table. There are 17 of them, 12 of them are normal with some slightly better suited to some play styles but it is a really minor thing, the remaining 5 are called "cursed tables", they are much smaller and you will never be able to get the best charms. It is not game breaking, but it is annoying if you want a highly optimized build.
The reason it happen is that the random number generator has only a small state (I think 16 bits) a limited number of possible rolls, and some seeds have a really short period, which limits it even more. Also, the seed is stored in your save file. It means that if you have a bad seed, you will always have a seriously broken RNG and the only way to change that is to create a new character. I think in later games, the same terrible RNG is used, but a new seed is picked each time you start the game, so you won't stay in the same "table". And while "charm tables" are the most obvious consequence, if probably has other, less noticeable effects on gameplay.
The weird part is that while it is obviously a bug as it negatively affects the experience of some random players, it is rarely referred to as such. It even persisted between versions. Some people even wrote tools to find out early on which table you are, and techniques to get the table you want, along with a variety of RNG manipulation exploits.
Some of my fondest gaming memories are from this game. There was nothing ever quite like it, and the allegiance system created a special type of community bond that I haven’t seen repeated.
At one point I had 10 “vassals” sworn to me, and in our allegiance that came with the expectation that you assist and mentor those under you. Stakes were higher when our allegiance committed to being red dot PKs on a white server with a few other stronger groups in play.
The server emulation scene has come a long way since retail shut down, but even the most populous servers are a pale shade of this game in its hey day.
MMOs are just so expensive to develop now that innovation is almost dead. You either make a WoW-like or you die (and many times you die anyway). They really don’t make MMOs like they used to.
Social media has taken a significant bite out of what I would consider the more important parts of the older MMOs, especially when it comes to apps like Discord. Not enough people want to interact in games anymore outside of specific gameplay scenarios.
That's because AC started development before even Ultima Online launched. It was started by a bunch of college students that loved MUDs and had no idea what they were actually doing.
It was such a special thing before it was hit with a lot of road grading over the 15+ years of monthly patching that smoothed a lot of the rough edges that made it really unique.
> MMOs are just so expensive to develop now that innovation is almost dead.
I look at this differently. I think the extreme cost of an MMO should force innovation. Take the constraints for what they are and run with them. Accepting that you can't do it the "traditional" way is the first step to figuring out a better way.
There is nothing that says a high-quality WoW killer absolutely must cost 10 figures to produce.
It does though, MMO are very complicated to do and there is a lot of tech to build by specialized people. The content is an other issue. You won't build anythinng close to wow bellow 100M.
Basic probability - randomly walking a graph will tend to have you in regions of higher vertex density. Same reason why randomly picking a road intersection will tend to put you in cities - cities just have most of the intersections.
(the walk isn't exactly random but it still works - the pathing heuristic is real-coordinate agnostic).
So it's not really a flag, it's that the algorithm that determines who the monsters attacks had a bug. The bug made it so that when there were a group of players to choose from, the mob would always pick players whose (hashed) identifier was higher on the list. Since the hash of your ID doesn't change, those players would always get picked on first. Like if your name is Aaron A. Aardvark, you're always coming first on any alphabetized list.
But it wouldn't always pick them, it would pick them with a higher probability that also depended on the players' relative distances to the monster.
So it's a stochastic bug where any given occurrence can be explained as "you just have confirmation bias, you don't notice all the times Zygy Zebra gets attacked" or "you must have been closer than Zygy even though you thought you were a bit further away". Especially if the game has a first-player perspective which makes it harder to estimate whether you or another player is closer.
The bugs in this game lead to some of the greatest gameplay of all time.
I think the best bug was the movement possibilities during spell casting from breaking animations. It created one of the most complex and amazing PK (PvP) dynamics of any MMO to exist.
The complexity of being able to move only so much to still get your cast off, and being able to slightly fast-cast or hold long delay-casts to "outplay" your opponents created so much depth to duels it was incredible.
Even after all these years I can still remember the Arc cast I would do, it was the keyboard combo:
Hold Left -> Hold Z -> X -> tap/hold up to control the radius
Then Hold Right -> Hold C -> X -> tap up to reverse the arc to return to where you initiated the cast so the spell could go off.
Man I miss this game, I hope someone creates a wonderfully buggy remake someday!
Bugs really made this game one of a kind, it's sad it would be so hard to replicate. The way he uses strafing and delay casting on corners to fight odds is amazing.
I still remember my hands shaking when I was in PK fights like this in my youth. The consequences of death made fighting so much more intense in this type of MMO.
Why does weighted randomness seem so difficult for games? Are there any libraries that simplify this? I couldn't find any for JS.
So I created my own "algorithms" for things like randomly choosing which type of plants spawn in the world. It weighs them by the local frequency of each land-type. Eg if there's more swamp nearby, it's more likely to spawn cattails.
It's awful. Not intuitive. Weights have to be passed in ordered smallest to largest. Posting in hopes someone will correct my entire approach or point out an industry-standard way of doing these things.
function weightedRandom(weight, outcomes){
var total = sum( weight );
var roll = Math.random()*total; // value in the range [0,total)
var seen = 0;
for(let i=0; i<weight.length; i++) {
seen += weight[i];
if(roll<seen)
return outcomes[i];
}
}
Statistics is actually pretty hard to get right, and it is the nature of the problem space that errors aren't immediately apparent unless one actually runs analytical regression on the algorithm to confirm it has the right "shape," which (a) can be time-consuming given the complexity of the algorithm and (b) doesn't tend to be part of unit tests because unit test doctrine is pathologically opposed to nondeterminism.
(This last pert is not an unsolvable problem, and in fact random algorithms should be unit tested. But it requires the right kind of unit testing).
On occasion I have written a seeded large n statistical test that passes & the submitted with the seed fixed. I think this is the soundest approach, although it has a "your random number is 7" feel to it.
In your example, the weights need to add up to 100 to work correctly, right? The issue with the code in the article is that the weights did not add up to 1.
What is the general benefit to a quantum random number generator versus a regular one? Is it that it's more truly random, or has a better distribution, or?
> Anyone who played Asheron's Call probably heard of the AI bug associated with monster Aggro where a person name "Wi" was forever cursed with getting the aggro no matter what.
Hmm, when they told me they were assigning people to a list I kind of knew what the answer was going to be.
What is curious is that it took them a long time to find. I’d think that -as long as you believe there is a bug- this should be fairly straightforward to spot.
I could be remembering incorrectly, but back when this was occurring no one knew that issue was specifically limited to "mob spawns attacked certain people". The more honest description is the line about "From the beginning of AC, some players have complained about unbelievably bad luck."
People blamed this behavior for everything from loot drops, to combat outcomes, and to aggro mechanics.
And also remember that AC was before MMOs became massively popular, and outside of specific events and locations, there wasn't always more than a handful of players in a certain area for a given server where aggro mechanics like this would matter.
> I’d think that -as long as you believe there is a bug- this should be fairly straightforward to spot.
This is the power of continuous integration.
There's a certain amount of optimism needed to keep going as a software developer, and then there's the crippling amount of optimism that a lot of people have which makes for difficult team dynamics. CI says it doesn't matter if it works on your machine, it's red on a neutral box so fix your problems or it's not going into the release. It's much harder to ignore Jenkins than to ignore David.
People learn through trial and error that the Wally Filter works on bugs, so denial is their first and best defense. Prove to me there's a bug. I won't spend any time on it until you do.
I was waiting for them to say something like "then we take the random variable and multiply it by 3 to correct for this" and then explain some other, more subtle, bug. Looks like we all make stupid mistakes like this sometimes. I made a bug where I tried to get the month number for the next month, so you had to wrap at 12, but I just did "(current_month + 1) % 12" and left it at that, thinking for some reason that modulo works differently than it does in reality. Some stuff broke recently, due to that, and it was quite embarrassing.
What am I missing here? Unless all the devs were really bad at maths (unlikely if they're game devs) then this seems like a really easy bug to find all things considered?
I thought maybe it was going to be some weird DB glitch or something far upstream from the algorithm selecting which player to attack, but it was literally the logic of the very algorithm you would first look at if you were aware of such an issue.
This was a fun read though. Finding and fixing bugs like this is some of the most satisfying work we do imo, and no one outside of tech understands =)
I would speculate the main hurdle was probably believing the players in the first place. Humans are notoriously bad at not-noticing-patterns in properly random data. And statistical bugs like this require more effort and careful attention to detail to reproduce than deterministic bugs.
Another hurdle is likely that game developer culture strongly favors integration testing over unit testing. Games are optimized for fun, not correctness, and you can't unit test fun. This specific roulette selection function would have been straightforward to unit test, and a unit test would have caught the distortion. But now imagine people keep varying how important distance is to the calculation in order to make it "feel right". Updating those unit tests is suddenly a noticeable slowdown on how quickly you can iterate on game feel.
Yeah, WoW had many many problems with people not understanding probabilities that they added explicit code to track all drop rates and compare them to intended - and actually found a bug or two that way.
But mostly it was to explain to people that a 1 in 100 chance doesn’t mean you’ll get it even after 200 goes.
Systematic debugging flaws probably; and lack of tooling to easily isolate.
Systematic flaws: a cross between groupthink, early flawed assumptions, deference to team leads, a 'I just look for 1hr, if I can't find move on' (which leads to not looking), or just plain simple "reading" instead of searching.
Lack of tooling: many game engines are infamous for lack of control over tooling. I havent used many, but I understand it would be quite an effort to run meaningful parameterised or structured fuzz testing on most systems. This makes it hard to artificially confirm suggestions. That said, there is practically no excuse for them not to just add a bunch of counters to the game - even on their internal testers it would very quickly become clear there was a bias.
Most of my 'should have caught it earlier bugs' are of the 'deference to lead' variety. I looked, didn't see immediately, handed off with some notes, and then the follow up debugger(s) took notes or thoughts as gospel. This is really hard to fight - I write something along the lines of "my hunch is there is a problem in code x because it handles y and is poorly structured/tested. I checked z and found i, j - queries as follows" and then find the debuggers effevtively refuse to look anywhere past x. This is particularly true for a group of debuggers, who play chinese whispers with groupthink and invent reasons it must be x.
Yes I came to post exactly this, and found your comment, so will reply instead. The bug doesn't seem to be obscure. It is there in the right place. Someone thinking about checking "why some players are attacked more often?" would probably choose this as the first place to double check, since it is directly related to selecting the player for attack.
Maybe the most occult part of this is figuring out that the unique IDs assigned to the players play a role.
> The bug doesn't seem to be obscure. It is there in the right place.
Well, there is a larger bug -- the entire algorithm, functioning properly, still won't behave the way this letter says that it should. It's not clear what it's designed to do, but it's very obvious that it doesn't do what the description says it does.
All characters would have been affected. If they had picked any 3 characters, and put them in a room with monsters repeatedly, they would have observed that the same character was attacked most times.
I don't understand the math. They are very clear about what they want their algorithm to achieve:
> The problem comes up when we are assigning portions of the range to various players. If we wanted distance from the creature to be proportional to your chance to be selected--that is, if the closer you are the less chance you have of being attacked--then we would assign this range by taking your distance from the creature over the total distance--the distances of everybody under consideration added together. But we really want the inverse of this ratio
That couldn't be more explicit. In the example model, where distance is danger, player D is twice as far away as player A, and has twice the chance of being attacked.
To invert that, in the game, where proximity is danger, when player A is twice as close as player D, he should have twice the chance of being attacked.
The game's algorithm does not attempt to do this. In the worked example, player A is 50% more likely to be attacked than player D is.
The correct algorithm is not difficult to write or to execute:
1. Assign all players equal odds of being attacked.
2. Weight the odds by the ratio of (distance_to_furthest_targetable_player / distance_to_me).
To make the example easier to follow, assign player D [distance: 10] 60 units of probability space. Then player A [distance: 5] should receive 60*(10/5) = 120 units, player B [distance: 2] should receive 300 units, and player C [distance: 3] should receive 200 units. Generate a real number (or, heck, an integer) in the range [0, 680) and you have your selection. Or, if you prefer, normalize all the odds and then generate something in the range [0, 1). But how did they pick the crazy algorithm they're actually using?
I love hearing about bugs like this one. It's a nice reminder of the problem solving that made me fall in love with computer programming, which sometimes gets obscured in the day-to-day processes of working on software.
It seems quite uncommon to write the statistical unit test for this sort of algorithm; I suppose most languages lacking statistical programming toolchain makes it less appealing.
I wonder if there’s a cheap Frankenstein approach to create a test-only interface into your code with a CLI / FFI and wrap it with some Python to easily test the result distributions.
Then you can use tools like scipy or newer probabilistic programming frameworks like pyro. Not sure what the FFI story is in R, maybe something similar could be done there.
I dream of the day when I can ask an LLM to write a modern version of AC in Unreal 7... Exploring Dereth with reality level detail would be incredible.
I enjoy reading such stories, where the software fault (or glitch) is something very simple and easily overlooked. Mind you, this story is from 2002 :-)
tl;dr: players get attacked based on monster targeting RNG that's supposed to take an interval [0,num_players], assign players to subintervals that are shorter or longer based on a bunch of factors, and then roll a random number somewhere in that full interval. Whoever's subinterval the number ends in, they get targetted.
Instead, the code assigned subintervals and then rolled a number between 0 and 1, instead of 0 and num_players. If your player happened to sort to the top of the list for subinterval assignment, you'd be it. You'd always be it.
Someone had to think to look at this, then confirm the maths, before it was found, years after the symptoms got reported. A unit test would have caught this, but didn't. Writing tests is annoying, especially in a codebase that keeps changing, but it's so important that this should count as one of those "this is what happens when you don't" lessons. Money was lost here.
I've always wondered the best way to write tests for "This event should happen x% of the time."
Obviously we could re-run the test 100 times and see if it happened close to x%, but not only is that inefficient, how close is "close"? You'll get a bell curve (or similar), and most of the time you'll be close to x but sometimes you'll be legitimately far away from x and your test will fail.
You could start from a known seed, but then are you really testing the percentages, or just checking that the RNG gives you the same output for the same seed, which you already know?
In the similar situations I've run, what I've often done is:
1. Start with a known seed.
2. Run a single test run like what you say, and verify this run by hand.
3. Freeze the test in this state, that is, assert you get that exact result every time on the given seed.
What this creates is not what I would strictly speaking call a "unit test", but it does sort of pin the algorithm to your examined and verified output. In this case, a human would quite likely have caught this problem on a decent test set. Obviously, there are other pathologies that would slip right by a human; the human being careful only raises the bar for such pathologies, it doesn't completely eliminate them.
But at least freezing it solves the problem where a change you did not realize would be a change slips by unnoticed and this function suddenly has a completely different outcome.
This has worked for me, in the sense it has caught a couple of bugs that would have had non-trivial customer implications. But I've never worked on an MMORPG or anything else where randomness was intrinsic to my problem; it has always been incidental, like, is my password generation algorithm correct and does this sample of my data look like what I expect, not the core of my system.
I'm not a fan of the set-seed solution. In the past, when I've tested PRNG implementations (Erlang used to not have the ability to have multiple, independently seeded RNGs), my approach was to decide on an acceptable rate of false negatives, and design my test around that. I figured I'd run the test suite no more than 10,000 times, and I wanted a 1/1,000,000 chance of ever seeing a false negative.
I can't remember the exact math I used at the time (I had to crack open a stats textbook), but ultimately it boiled down to generating a bunch of (seed, number of values previously pulled based on that seed, value) tuples, running a linear regression against them, and defining a maximum acceptable R^2 value based on my 10^-30 acceptable probability of a false fail.
When the RNG is not the thing being tested, mocking the RNG to do a sampled sweep through the RNG's output range is typically the correct move.
> I've always wondered the best way to write tests for "This event should happen x% of the time."
I use https://github.com/pkhuong/csm/blob/master/csm.py to set a known false alarm rate (e.g., one in a billion) and test that some event happens with probability in a range [lo, hi], with lo < expected < hi (e.g., expected +/- 0.01). The statistical test will run more iteration as needed. If that's too slow, you can either widen the range (most effective), or increase the expected rate of false alarms (not as effective, because the number of iteration scales logarithmically wrt false alarms).
You would probably not test "happens x% of the time", but rather that for a given set of inputs, passing in 0.1, 0.2, 0.3, etc, let you have the expected outcomes across the distribution. So you're testing that you can achieve each outcome across the spectrum with a pre-selected "random" number.
unit tests are notoriously bad about testing n>2. I just ran into a problem with sorting recently that was caused by buggy comparators, but it didn't bubble up to the tests until the runtime switched to a different sort implementation. Most of the tests were doing n=2 so it was not visible under the old runtime version but was under the new one. This has been broken in production for months if not years. If the test had used n=5 I'm pretty sure the old tests would fail as well.
100? Let's make that 100,000 instead, and then you check the resulting distribution. It's a unit test, not an integration test: this will run very quickly. Even on 1999 hardware.
The first unit testing library, JUnit (Java), was released in 1997.
Asheron's Call was released in 1999 after four years of development. It's quite possible that this bug was introduced before the concept of unit testing even existed or was widely known.
My startup integrates with numerous third-party data sources, and they'll send quotes in all shapes and colors, usually providing both a total and a complicated breakdown which we need to parse and reshape/classify into line items in our system. And since it's very easy to introduce a bug that skips over a fee accidentally, part of our code review checklist is to ensure that we always explicitly check that our code summing over our final line items sums to the actual ground-truth total; we have a runtime assertion that logs an error to Sentry and, depending on business requirements, shows a (friendly) error rather than incorrect pricing. It's saved us many a time from silly bugs where we're non-exhaustively handling parts of the breakdown data structure.
To generalize, it's vital to have something like Sentry from day one - a low-cost abstraction that lets you monitor broken assumptions asynchronously. Though of course, these kinds of tools didn't exist in 2002!
> Writing tests is annoying, especially in a codebase that keeps changing
Only if you don't have a functional core.
One of the psychology tricks I've learned about unit testing is just how bad sunk cost fallacy affects some people (all the time, and most people some of the time). I've witnessed people pairing for two days to fix a couple of nasty integration tests. That's 3 person-days of work lost on a couple of tests. How many release cycles would it take for that test to pay for itself versus manual testing?
You want the tests at the bottom of your pyramid to be so simple that people think of them as disposable. They should not feel guilty deleting one test and writing a replacement if the requirements change. Elaborate mocks or poorly organized suites can take that away. Which is hard to explain to people who don't see the problem with their tests.
You also want the sides of that pyramid to be pretty shallow, especially if you keep changing your design.
The RNG was supposed to generate a random number in [0, sum of weights] not [0, num_players].
It's also possible to write a test for this where the behavior in the test accurately tests the wrong behavior. The error is pretty subtle. There was even a “bug” in your summary of the bug ;)
Unit tests for things with probability are hard. I've written one recently, and among other reasons, it convinced me to write the selection differently so it was both more 'fair' and more easily tested.
It's slightly more complex than that: if you were farther away or otherwise in a better position than average your portion of the interval would be <1, and so even if you sorted first you still weren't guaranteed to be hit.
Is it so important? I think this made the community and game world more interesting. It's emergent behavior. As a game developer, I hope I can make things complicated enough to have emergent behavior, even if that's just from my bugs.
You know what would have made it even better though? Not having a bug that made your character cursed through no fault of your own without any recourse.
I felt more and more that dice values of specifically 1 and 6 were harder to come by than other values, so one day I sat down for a few minutes and logged the value of 100 or so dice throws. Turns out I was right, the distribution was not uniform: 2-5 were fine, but it seemed that it was twice as hard to get a 1 or a 6 compared to any other value.
I even did a chi-squared hypothesis test, because I was crazy. (And because I was studying for my statistics minor in university at the time).
Seeing the result, the problem was pretty clear, without even knowing the source code. Almost certainly, they had a random number generator giving a number in a uniform range, let's say from 0 to 1, and did something like the following to get a dice value from 1 to 6:
Do you spot the issue?The problem is that round() rounds to the nearest integer. So any value between, say, 1.5 and 2.5 (exclusive at one end) rounds to 2, 2.5-3.5 rounds to 3, and so on. Add 1, and you have the dice value.
But the ranges for dice values 1 and 6, are only 0 to 0.5 and 4.5-5 respectively, so ranges that are only half the size.
The fix should be extremely simple:
The crucial point being to use floor() or ceil() instead of round(), to only round towards one direction.I wrote up exactly this in a short email to the company that made the Yahtzee game. They did not reply and just silently fixed the bug, right after my email. I was disappointed and stopped playing.