The comments on this fast range reduction are interesting, I recommend reading them (they're much longer than the post itself). Some highlights:
Versions of this exist as early as 1997, in CMU's Common Lisp implementation. Their GitLab is not loading quicky on my computer, but the commit is supposed to be from December by Douglas Crosher (https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-...).
The "Further reading" section (and comments) include implementations of the trick in the wild. Stockfish, the chess engine, used this optimization to reduce memory for storing precomputations of moves in their transposition tables (https://github.com/official-stockfish/Stockfish/commit/2198c...). The commit switches from using bitwise AND (&) and power-of-2 table sizes to using the modulo hash trick, which allows for full use of RAM (even if it's not an exact power of 2). It's also used in Hashtron's modular hash: https://github.com/neurlang/classifier/blob/master/hash/hash..., a straightforward application of the optimization to avoid using modulo.
A GitHub gist with more information: https://github.com/sipa/writeups/tree/main/uniform-range-ext.... It walks through a couple of intuitive explanations on why this technique maintains maximum uniformity in the output distribution, as opposed to the proof in the post.
The comments on this fast range reduction are interesting, I recommend reading them (they're much longer than the post itself). Some highlights:
Versions of this exist as early as 1997, in CMU's Common Lisp implementation. Their GitLab is not loading quicky on my computer, but the commit is supposed to be from December by Douglas Crosher (https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-...).
The "Further reading" section (and comments) include implementations of the trick in the wild. Stockfish, the chess engine, used this optimization to reduce memory for storing precomputations of moves in their transposition tables (https://github.com/official-stockfish/Stockfish/commit/2198c...). The commit switches from using bitwise AND (&) and power-of-2 table sizes to using the modulo hash trick, which allows for full use of RAM (even if it's not an exact power of 2). It's also used in Hashtron's modular hash: https://github.com/neurlang/classifier/blob/master/hash/hash..., a straightforward application of the optimization to avoid using modulo.
A GitHub gist with more information: https://github.com/sipa/writeups/tree/main/uniform-range-ext.... It walks through a couple of intuitive explanations on why this technique maintains maximum uniformity in the output distribution, as opposed to the proof in the post.