> where BAD achieves an average score of 24.174 points in the two-player setting...

> where BAD achieves an average score of 24.174 points in the two-player setting, surpassing the best previously published results for learning agents by around 9 points and approaching the best known performance of 24.9 points for (cheating) open-hand gameplay

I didn't realize Hanabi is already that close to being solved.

My gut reaction is that the game is a lot simpler than it appears. I guess your simpler "matrix" game points to that--you already had an intuition for reducing Hanabi. Indeed, looking at the code you share for the "matrix game," it would seem that Hanabi's problem is that, like Chess and Go, it doesn't really resemble more sophisticated games as much as it resembles something that can be literally expressed in Tensorflow.