I find it strange that they don't use some true random source in generating the tickets. I'm also surprised the article finds this obvious.
"Of course, it would be really nice if the computer could just spit out random digits. But that’s not possible, since the lottery corporation needs to control the number of winning tickets"
Surely the lottery could quantify uncertainties and set up a system where the probability of them losing money would be arbitrarily small. Interesting interview, btw.
It sounds like ticket design is the problem; an algorithm is needed that:
- produces a set of bingo cards or tic-tac-toe boards such that only one board per card is a winner
- produces cards that contain a number of almost winners
Simply randomly assigning numbers wouldn't work, and it seems the method they've developed for building "fun to play" cards contains weaknesses against statistical analysis.
Hmm... now I'll be stuck mulling over a good algorithm for this all day...
The solution is straightforward: generate truly random boards, evaluate them, and then use the set that fits your payout profile. The problem is that these boards will lack the enticing hooks that keep folks coming back. The complete solution is to also discard boards that are not enticing enough. The result will have fewer artificial patterns like the ones used in the article to determine winners.
That procedure is vulnerable to the same issues mentioned in the article.
Let's say you design a game that has outcomes Lose, Near-Miss, Win, and Invalid (tickets that must be suppressed, e.g. multiple wins). Then, imagine a rare, salient pattern, like 3 singletons in a baited-hook row. That might be an extremely rare occurrence in the overall lot of random boards, but it might still occur disproportionately in, or in the vast majority of, Win boards and Invalid boards.
If you then choose the Lose, Near-Miss, and Win cards randomly, in the desired proportion, from your truly randomly generated set, then the pattern will be statistically correlated -- potentially strongly -- with the Win cards. That's what the article describes.
A single confusing sentence in the article seems to have gotten a lot of people (including me, at first) thinking this had something to do with PRNG; rather, apalmblad's claim that this is a game design issue seems right.
Only if the "game design" purposely used a limited pool of numbers for the visible and hidden boards. If there was no guarantee that a number would appear 1.9 times, then seeing singleton numbers wouldn't be a predictor.
My guess is that the restricted number pool was used as a means of easily mapping a number (the number on the back of the card) onto a playing board. Anyone interested in the why and how of this should look at The Wizard of Odds[1]. If the number on the card was used to seed a PRNG which then produced a lot more data, there could be a more sophisticated board generator that doesn't need to take such compromising shortcuts.
So in a sense, the culprit is game design in that rules were created to allow for certain percentages of Loses, Near Wins, and Wins. But there is no reason to mix output control into game design. My suggestion is to make the game rules with no regard to controlling output. Instead, evaluate the generated boards and keep/drop them to control the output.
If there was no guarantee that a number would appear 1.9 times, then seeing singleton numbers wouldn't be a predictor.
Let's say there's no such demand, and the boards are generated truly randomly and neutrally. If visible-quality X is disproportionately correlated with invisible-quality "Win" then the game is already flawed; this may emerge naturally from the Win conditions and from the game design decision of what is shown in the baited-hook. If the output is controlled post-generation to increase the proportion of Win cards and Near-Win cards vs. Lose and Invalid cards, then the statistical correlation may be greatly increased.
Simplified example: Scratcher with two numbers 0-4: one bait, one hidden, pays if sum is 5 or 6. If everything is fair and truly random, the odds of winning are, 0 showing: 0%, 1 showing: 20%, 2-4 showing: 40%. Already a bad game, but now the game designers want to eliminate cards with sums of 7 or 8 because this confuses people, (not minding that they're changing overall odds of winning), so they block those Invalid cards from shipping without blocking anything else. That gives:
0 1 2 3 4
0 - - - - -
1 - - - - W
2 - - - W W
3 - - W W I
4 - W W I I
The new odds of a given ticket, given the visible number, are: 0: 0%, 1: 20%, 2: 40%, 3: 50%, 4: 67%. The point here is that now a 4-showing-card is >3x as good as a 1-showing card, when it used to be only 2x as good, and it might now have positive expected return.
OK, database geek hat on here. It's not too hard to pregenerate a set of every possible ticket combination, just lots of cross joining of numbers tables and a bit of coding of the play rules. Add in the payout rules and you can produce a total set of all possible cards along with the payout for each given card.
From this set, we can select exactly the number and balance of paying and non-paying cards we want, chosen pseudorandomly from the set of possible cards. Output that set and sort that pseudorandomly, then send to the printers. You've got a (pseudo) random selection of cards with an absolutely controlled payout pattern.
The downside, and what seems to be the critical problem for the suppliers, is that I suspect you'd have less 'near miss' cards this way to entice players. But it's absolutely possible to ensure that you have a defined payout pattern while also giving random ticket distribution.
I don't think you've done the math yet on how many possible ticket combinations there are. I have more reason than most people to spend time thinking about bingo math. Get out a piece of paper and run some calculations: enumerating the whole solution space is not feasible.
Anyhow, the real solution is easier: checking a bingo card or lottery ticket for any feature you want is, essentially, O(1). Generating a lottery ticket is O(1). Generate billions, sort into piles, choose millions from piles in proper proportions, randomize order, done.
This still won't help you if you print sufficient information on the ticket to reverse engineer whether it won or not without actually purchasing it, of course.
I'm a Brit (and not a lottery player) so not fully familiar with US state scratchcard lottery game mechanics :-) The longest odds straight lottery over here is 75m to 1 which is definitely enumerable, but if you say these cards get into the billions and would crash my server then I'll believe you! Agreed that you can still generate a controlled prize distribution with random tickets generated individually rather than pregenerating all combinations - it's a bit more fiddly but perfectly doable.
All this I agree won't help if your card design is as poor as the original article was, but to address the complaint of the poster before me it's perfectly possible to randomly generate tickets while controlling the number of winning tickets - you just have to have a screening function which prevents more than your controlled number of winners making it through.
I've heard of this technique used for games of chance in vegas where they need to tightly control the outcomes both so they don't lose money but also to satisfy the reporting requirements for the regulatory commissions.
They probably have to specify exactly what percentage of income is given out in winnings. Not to protect the lottery company, but to protect lottery buyers.
"Of course, it would be really nice if the computer could just spit out random digits. But that’s not possible, since the lottery corporation needs to control the number of winning tickets"
Surely the lottery could quantify uncertainties and set up a system where the probability of them losing money would be arbitrarily small. Interesting interview, btw.