Yeah but what is in a 4090 is also comparable to a whole rack of servers a decad...

		wkat4242 8 months ago \| parent \| context \| favorite \| on: Llama 3.1 405B now runs at 969 tokens/s on Cerebra... Yeah but what is in a 4090 is also comparable to a whole rack of servers a decade ago. The tech will get smaller.