"1/4 the speed of a Pi" applies to the original (slow) 2012 Pi which is unable t...

"1/4 the speed of a Pi" applies to the original (slow) 2012 Pi which is unable to run LLM as fast as you think. However the 2020 Pi 400 (equivalent to Pi 4), which can run the LLM workload, is about 100 times faster than the Cray 1:

"Raspberry Pi ARM CPUs - The comment above was for the 2012 Pi 1. In 2020, the Pi 400 average Livermore Loops, Linpack and Whetstone MFLOPS reached 78.8, 49.5 and 95.5 times faster than the Cray 1." http://www.roylongbottom.org.uk/Cray%201%20Supercomputer%20P...

A Pi 4 can infer ~0.8 tokens/sec with some of the more optimized configs (as per https://www.dfrobot.com/blog-13498.html). So the Cray would have needed ~2 minutes per token, so ~2.5 hours to generate one sentence... if hypothetically it had enough RAM (it didn't).

In 1978 RAM cost about $25k per megabyte (https://jcmit.net/memoryprice.htm). Assuming you needed 4GB for inference, RAM would have cost $100M in 1978 dollars, or $470M in today's dollars.

For comparison, the Cray cost $7M in 1978 which is $32M in today's dollars. So once you buy a Cray you would have had to spend 14 times that amount on building a custom RAM device extension of 4GB, somehow hooked to the Cray, to finally be able to generate one sentence every 2.5 hours...

But in 1978, even if RAM was available to do LLM inference, it would have been impossible to train the model, as vastly more compute power is needed than for inference.