It's not super clear whether the training task can be scaled in a manner similar to protein folding. It's a bit trickier to optimise ML workflows across computation nodes because you need more real time aggregation and decision making (for the algorithms).