Yes, it’s complicated, for sure. But if we amortize over 3 years, and triple costs for the power: it’s ~500k/yr for hardware and ~100k/yr for power.
In terms of TPUs or other custom accelerators, sure, they exist. However most definitely aren’t building their own hardware.
ETA: I’m not saying power is irrelevant, it clearly matters. But saying it’s the dominant financial constraint is clearly wrong, at least below Google/Amazon/Apple scale. Never mind the cost of the people running these trainings!
In terms of TPUs or other custom accelerators, sure, they exist. However most definitely aren’t building their own hardware.
ETA: I’m not saying power is irrelevant, it clearly matters. But saying it’s the dominant financial constraint is clearly wrong, at least below Google/Amazon/Apple scale. Never mind the cost of the people running these trainings!