You're missing that not everything has to be a breakthrough. Sometimes fun research experiments are just cool and fun. I didn't get the impression that they were overselling the research.
Sure they are, "recreate game engine" when it actually is "optimally match a database of cooked if-then rules with simple linear functions". For one Mario level. Taking 2 weeks of runtime to learn it.
The set of facts is worthless for anything of complexity.
It does not really generate the rules itself. (They are directly derived from the facts.)
What they did is only a small improvement over a typical expert system or CNN for a very limited case.
Choice quote: "Notably each fact can be linked back to the characteristics of a sprite that it arose from."
Wrong. When you pick up a flower your sprite changes, but how does it know you can suddenly shoot bullets? Etc. And for more complex games a lot of data requires exploration well past the GUI. An action might change acceleration (suddenly nonlinear ice physics with momentum), or direction handling, or you can start flying, or many other things. What if the thing moves in a circle? What if there is just some probability that something results?
The approach will fail at modelling as soon as Mario level 1-4. (The one with rotating fireballs.) Or produce an insane representation of the engine. Note how it even cannot model the dampened triangle wave motion of the fireballs in the example - assumes they're a sparse line.
The paper presents no way to reduce these huge number of "if-then rules" to something actually useful either.
Since this doesn't even attempt to explore the state space, it also requires a huge database.
Calling this "recreate game engine" is akin to saying that since we have an algorithm that can solve checkers, it will solve poker, go and also whodunit. And can play Jeopardy too.
I even suspect it's not useful as a preprocessor to something that can actually play a game, as it will break later cases.
I was pretty impressed by the result until reaching " a relatively simple search algorithm that searches through possible sets of rules".
CNNs have done such impressive things that "outperforms convolutional neural nets" sounds like an achievement, but CNNs have never been the pinnacle of accuracy - their key advantage is flexibility. Feature learning costs some reliability, but gives a huge advantage in saving human time and effort.
This appears to be exactly the opposite approach, an AI system that gains its accuracy by working from heavily pre-defined rulesets. Feature engineering is fine in a stable, well-understood domain, but it reduces the impressiveness of the 'AI' result. And more worryingly, it cripples the flexibility of the agent in a open domain like "video games".
Hand-authoring a set of functions required to derive the model means embedding a huge portion of the game engine in the engine-learning framework - what's left to learn is basically just parameter values. Mario without powerups is a game entirely defined by 2D movement, collisions, animation, and a tracking camera. That's the same feature list that had to be hand-defined for the engine.
I don't mean to attack the authors. This is still an interesting result, and they do acknowledge this in P2 of 'Limitations'. (Albeit with some lofty claims about eventually understanding real video - are they planning to encode physics as their ruleset?) But the article really oversells the capacity of a system that was spoon-fed the essentials of what it had to learn.
People generally are willing to forgo the cost/benefit analysis of a machine learning solution. There is an abiding faith in future improvements in cost although I am not so sure anymore.