I'm not the OP (who replied already) and I don't think old benchmarks are useless but I'm worried that teams trying to beat a dataset from an old competition for a sufficiently long time will inevitably overfit to the dataset, reducing the accuracy of their published results. That's even more so when the test set for the competition is available and there's nothing really keeping it from "creeping" into the training set at some point, maybe between different system versions.
What would really be useful is a sort of ongoing challenge where a training set stays up for a decade at least and the test set is never revealed (but can be used to test systems). Perhaps data could even be renewed every few years as long as new examples can be reliably collected in a similar enough manner with older data.
I'm not the OP (who replied already) and I don't think old benchmarks are useless but I'm worried that teams trying to beat a dataset from an old competition for a sufficiently long time will inevitably overfit to the dataset, reducing the accuracy of their published results. That's even more so when the test set for the competition is available and there's nothing really keeping it from "creeping" into the training set at some point, maybe between different system versions.
What would really be useful is a sort of ongoing challenge where a training set stays up for a decade at least and the test set is never revealed (but can be used to test systems). Perhaps data could even be renewed every few years as long as new examples can be reliably collected in a similar enough manner with older data.