I am surprised that this claim keeps getting made, given the observed prices.
Even if one thinks that the losses of big model providers are due to selling below operating costs (rather than below that plus training costs plus the cost of growth), then even big open-weights models that need beefy machines, look like they eventually* amortise the cost so low that electricity is what matters; so when (and *only* when) the quality is good enough, inference is cheaper than the food needed to have a human work for peanuts — and I mean literally peanuts, not metaphorical peanuts, as in the calories and protein content of bags of peanuts sufficient to not die.
* this would not happen if computers were still following the improvements trends of the 90s, because then we'd be replacing them every few years; a £10k machine that you replace every 3 years cost you £9.13/day even if it did nothing.
https://www.tesco.com/groceries/en-GB/products/300283810 -> £0.59 per bag * (2500 per day/645 per bag) = £2.29/day; then combine your pick about which model, which model of home server, electricity costs etc. with your estimate of how many useful tokens a human does in 8,760 hours per calendar year given your assumptions about hours per working week and days of holiday or sick leave.
I know that even just order-of 100k useful tokens is implausible for any human because that would be like writing a novel a day, every day; and this article (https://aichatonline.org/blog-lets-run-openai-gptoss-officia...) claims a Mac Studio can output 65.9/second = 65.9 * 3600 * 24 = 5,693,760 / day or ~= 2e9/year, compare to a deliberate over-estimate of human output (100k/day * 5 days a week * 47 weeks a year = 2.35e7/year)
270 W for *at least (2e9/year / 2.35e7/year) 85 times* the quantity (this only matters when the quality is sufficient, and as we all know AI often isn't that good yet) of output that a human can do with 100 W, is a bit over 31 times the raw energy efficiency, and electricity is much cheaper than calories — cheaper food than peanuts could get the cost of the human down to perhaps about £1/day, but even £1/day is equivalent to electricity costing £1/(24 hours * 100 W) = £0.416666… / kWh
Running a local model is not an apples comparison. Yes, if you run a small model 24/7 without a care for output latency and utilization is completely static with no bursts, then it can look cheap. But most people want output now, not in 10 hours. And they want it from the best models. And they want large context windows. And when you combine that with serving millions of users, it gets complicated and expensive.
Yes, but usage is not uniform even when you have millions of users. It smooths the usage lines, but the peaks and troughs become more extreme the more users you have. At 3am usage in the US goes down to effectively 0. Maybe you can use the compute for Asia customers, but then you compete with local compute that has far better latency.
Then you have seasonal peaks/troughs, such as the school year vs summer.
When you want 4 9s of uptime and good latency, you either have to overprovision hardware and eat idling costs, or rent compute and pay overhead. Both cost a lot.
I am surprised that this claim keeps getting made, given the observed prices.
Even if one thinks that the losses of big model providers are due to selling below operating costs (rather than below that plus training costs plus the cost of growth), then even big open-weights models that need beefy machines, look like they eventually* amortise the cost so low that electricity is what matters; so when (and *only* when) the quality is good enough, inference is cheaper than the food needed to have a human work for peanuts — and I mean literally peanuts, not metaphorical peanuts, as in the calories and protein content of bags of peanuts sufficient to not die.
* this would not happen if computers were still following the improvements trends of the 90s, because then we'd be replacing them every few years; a £10k machine that you replace every 3 years cost you £9.13/day even if it did nothing.
https://www.tesco.com/groceries/en-GB/products/300283810 -> £0.59 per bag * (2500 per day/645 per bag) = £2.29/day; then combine your pick about which model, which model of home server, electricity costs etc. with your estimate of how many useful tokens a human does in 8,760 hours per calendar year given your assumptions about hours per working week and days of holiday or sick leave.
I know that even just order-of 100k useful tokens is implausible for any human because that would be like writing a novel a day, every day; and this article (https://aichatonline.org/blog-lets-run-openai-gptoss-officia...) claims a Mac Studio can output 65.9/second = 65.9 * 3600 * 24 = 5,693,760 / day or ~= 2e9/year, compare to a deliberate over-estimate of human output (100k/day * 5 days a week * 47 weeks a year = 2.35e7/year)
The top-end Mac Studio has a maximum power draw of 270 W: https://support.apple.com/en-us/102027
270 W for *at least (2e9/year / 2.35e7/year) 85 times* the quantity (this only matters when the quality is sufficient, and as we all know AI often isn't that good yet) of output that a human can do with 100 W, is a bit over 31 times the raw energy efficiency, and electricity is much cheaper than calories — cheaper food than peanuts could get the cost of the human down to perhaps about £1/day, but even £1/day is equivalent to electricity costing £1/(24 hours * 100 W) = £0.416666… / kWh