That's only valid if the known bots are a random sample of all bots.
They can't test against a large array of public bots because they're only detecting political bots, not everything automated. So they'd have to train/test on "accounts which are definitely known to be bots, but trying to hide it". Meaning, presumably, the least-convincing bots or bots specific to a previously-exposed network.