TL;DR: This post is great and it raises questions about automating phylogeny guesses about game components.
I think I remember seeing a similar post about Chinese bootleg games for non-PC systems. The commonality was comparing pixel data to trace what the bootlegs had taken parts from. That raises a question: can we automate inferring the copying and modification of parts between older games?
In biology, there's a similar technique. Phylogenetic analysis compares sequences to reconstruct evolutionary relationships between samples. From what I understand, it seems possible to do something similar for the pixel data of game assets.
However, there are limits: you have to stay above a certain pixel size for results to have meaning. The lower your resolution, the more each pixel font looks the same. The same seems to be true for sprites representing in-game elements.
Something I find interesting about computer fonts and stuff is how much they remain the same over time, down to the bit. In phylogenetics, the passing of time is measured by the molecular clock - the accumulated mutations that are constantly happening.
In a game I'm reversing right now there's an arial8.tga which struck me as feeling completely timeless, or out-of-time. The game is from the year 2000, but this exact file could have appeared in any game over like a 40 year timespan.
Perhaps even longer. Arial dates back to 1982 and was meant as a metric-compatible lookalike for Helvetica which dates back to 1957, and is based on 1898's Akzidenz-Grotesk ; but of course at that time, filenames didn't yet have extensions.
I think I remember seeing a similar post about Chinese bootleg games for non-PC systems. The commonality was comparing pixel data to trace what the bootlegs had taken parts from. That raises a question: can we automate inferring the copying and modification of parts between older games?
In biology, there's a similar technique. Phylogenetic analysis compares sequences to reconstruct evolutionary relationships between samples. From what I understand, it seems possible to do something similar for the pixel data of game assets.
However, there are limits: you have to stay above a certain pixel size for results to have meaning. The lower your resolution, the more each pixel font looks the same. The same seems to be true for sprites representing in-game elements.