Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Am I the only one that is shocked by the fact that a 1978 computer, even if a supercomputer (but still using the technology of the time) was 1/4 the speed of a Raspberry? The Pi, if you look at the big picture of computing, is a very fast computer. For comparison: you can run a 1 billion parameters LLM on a Raspberry pi at decent speed. This means that the Cray could run it, even if slowly. That's incredible.


Find it similarly amazing, yes.

> This means that the Cray could run it

Gladly, we had better things to do than that :D

But seriously, while we could have run it maybe speedwise, it definitely lacked the memory, not? And if one tried to train it he wouldn't be finished today. But would make a fun backwards sci-fi story imagining a time traveller that brought the 80ies an LLM from today, what would the world say and do with that slow oracle?


I can't find it now but I wrote a bit of fanfic where Ken Thompson sent an LLM back in time (referencing his "Love, Ken" UNIX tapes that he would send out) to save humanity. He was always a bit head of his time.


May you please do me (and others) a favor:

What value does an LLM hold intrinsically.

Lets say "brought an LLM from today"

Does that mean just a multi gig file? What is INSIDE the LLM that would be of value? How does one speak to an LLM WRT 80's tech, and what could one glean from it....

ELI5 an LLM;

BARD: https://i.imgur.com/ahRVECz.png

OpenAI: https://i.imgur.com/Rbk5BD6.png

Bing: https://i.imgur.com/zVJ1tu6.png

--

So, how would one explain 80s folks what even an LLM is when we cant even ELI5 2024?


I don’t think they’d have any trouble with the math, it’s just a bunch of regressions and matvecs, right?

I think the process of collecting and storing all the data would be more mind blowing to them—of course they were at the beginning of Moore’s law, so they could see the trajectory if they looked for it, but it is one thing to stand on the coast with waves lapping at your ankles and imagine how the ocean gets deeper as you keep going and another to get chucked out of a helicopter in the middle of the Pacific.


In a way that someone in the 80s could understand?

An LLM is a very highly compressed store of knowledge combined with an advanced parser than understands questions in plain English. A consequence of the compression is that sometimes the answers lose some accuracy, which is a deliberate trade-off to make it work at all.


Neural networks were known about in the 80s, they were theorised about in the 1800s ffs, and the first computer based NNs were in the 1950s.


Now paint me a picture of a cat. Good LLM.


LLMs don't paint.


My story plot would sure include LLM (coefficients file as today) + code to run it. So 80s humans could run it on the Cray, ask it questions, and get answers (after some time :D).

LLM could explain itself what it is.. (if there are not more important questions to ask, contention would ensue).


LLM is a lossy compression of the internet. We can provide it in a form that is directly executable on 80s computers though gpt4 tries to convince me that it is practically impossible the reduced model would be much weaker (somebody doesn't want to be sent to 80s ;)


Yes, that means just a multi gig file.

The hard part of LLMs (and current AI in general) is training, which is orders of magnitude harder than inference.

If somehow we had a way to travel to the future in the 70s, train the models and then come back, we would be in Star Trek right now


"1/4 the speed of a Pi" applies to the original (slow) 2012 Pi which is unable to run LLM as fast as you think. However the 2020 Pi 400 (equivalent to Pi 4), which can run the LLM workload, is about 100 times faster than the Cray 1:

"Raspberry Pi ARM CPUs - The comment above was for the 2012 Pi 1. In 2020, the Pi 400 average Livermore Loops, Linpack and Whetstone MFLOPS reached 78.8, 49.5 and 95.5 times faster than the Cray 1." http://www.roylongbottom.org.uk/Cray%201%20Supercomputer%20P...

A Pi 4 can infer ~0.8 tokens/sec with some of the more optimized configs (as per https://www.dfrobot.com/blog-13498.html). So the Cray would have needed ~2 minutes per token, so ~2.5 hours to generate one sentence... if hypothetically it had enough RAM (it didn't).

In 1978 RAM cost about $25k per megabyte (https://jcmit.net/memoryprice.htm). Assuming you needed 4GB for inference, RAM would have cost $100M in 1978 dollars, or $470M in today's dollars.

For comparison, the Cray cost $7M in 1978 which is $32M in today's dollars. So once you buy a Cray you would have had to spend 14 times that amount on building a custom RAM device extension of 4GB, somehow hooked to the Cray, to finally be able to generate one sentence every 2.5 hours...

But in 1978, even if RAM was available to do LLM inference, it would have been impossible to train the model, as vastly more compute power is needed than for inference.


That was on a 700 MHz Raspberry Pi 1. On an 1800 MHz Raspberry Pi 400 NEON SIMD the difference was another order of magnitude.

[QUOTE] Comparison - The three 700 MHz Pi 1 main measurements (Loops, Linpack and Whetstone) were 55, 42 and 94 MFLOPS, with the four gains over Cray 1 being 8.8 times for MHz and 4.6, 1.6, 15.7 times for MFLOPS.

The 2020 1800 MHz Pi 400 provided 819, 1147 and 498 MFLOPS, with MHz speed gains of 23 times and 69, 42 and 83 times for MFLOPS. With more advanced SIMD options, the 64 bit compilation produced Cray 1 MFLOPS gains of 78.8, 49.5 and 95.5 times.[/QUOTE]


Remarkable, yes. Shocking, no. Exponential growth was something experience in the computer industry for decades, and people where quite normalized to it.


Would we have been able to train the LLM in the first place though? Guessing that that would have been completely infeasible?


This guy Time Travels. (check his hands, he likely has extra fingers)

But... lets look at the availability of DATA in the 80s..

Frankly, this is how hacking/phreaking was invented.

Dumpster-diving for line-printer discards in dumpsters to understand what their systems did.

(This is an actual story; people were bin dipping (at&t?) dumpsters and finding exploits (social or electronic) in the discarded line-printer outputs....

Can someone validate that comment?

--


This spirit appears to be alive and well.

Brian Roemmele says they've been dumpster diving for decades salvaging huge collections of microfilm/microfiche that's been thrown out by libraries, research institutions, etc.

Now that LLMs are here, they're taking that collection and training an LLM against it (instead of the internet): https://twitter.com/BrianRoemmele/status/1746945969533665422


The supercomputers of the time were very heavily designed to run floating-point operations (IIRC) and so while the FPU performance might be comparable, I'm not sure a Cray could be used as a "1/4 speed Pi" for general computing things like running Linux.


I used a Cray around 1998 (from the Pittsburgh Supercomputing Center IIRC) and it was super fast on very particular tasks. Specifically, there was some type of processing pipeline that once you had it set up, it would produce a stream of calculations very quickly.

I wonder if the Raspberry Pi is faster on all tasks, or is there some type of computation the old Cray is still competitive?


I suspect the Cray is "competitive" for some value of "doesn't absolutely stink" for things that are designed for it.

But you can emulate a Cray on an FPGA: https://www.chrisfenton.com/homebrew-cray-1a/ so I suspect that while it could still do "real work" you can also beat the pants off it if you setup your code as designed to run on modern GPUs.


The shocking thing is that every contemporary PC and handheld device would place on the TOP500 list in the 90's yet they're still burdened with slow software when doing basic operations.


Ye my fastest computer I will ever own was a Pentium 200MHz with 32Mb of ram running Windows 95.


Given that it also functioned as sort of an uncomfortable couch—not really.

Besides, it's way further behind in basically every respect but compute.


The other factor is RAM, which is more problematic. The Cray-1 had up to 4 Meg WORDS RAM, or 2 Meg as we would measure today (I think).


Cray was 64 bit, IIRC, so 4 megawords would be 32 megabytes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: