Hacker News new | past | comments | ask | show | jobs | submit login
GDDR7 Memory Supercharges AI Inference (semiengineering.com)
68 points by PaulHoule 3 months ago | hide | past | favorite | 34 comments



What an annoying article to read. "The AI workload of AI in a digital AI world that the AI world AI when it AIs. Also the bandwidth is higher. AaaaaaaaaIiiiiiiiiiii".

90% of the article is just finding new ways to integrate "AI" into a purely fluff sentence.


Ok, I should be fair, it's 4 paragraphs of fluff, 6 paragraphs of specs, then a fluff conclusion. It's almost like 2 different unrelated articles smashed into 1.

Still makes for an annoying read.


Sounds like the sorta thing AI would write


AI only appears 7 times in the article's 11 paragraphs though. I mean I'm sure it's fluffed out, I glazed over and lost interest, but still.


> 90% of the article is just finding new ways to integrate "AI" into a purely fluff sentence.

I mean, to be fair, that’s half the industry right now. Hard to blame them all that much.


>that’s half the industry right now

Isn't that a bad thing?


Yes, but… It is how it is. Give it a few years, and it’ll be some new fad. There is always a buzzword du jour. This has been more or less the case for the industry since the 1950s.


PAM3 is 3 levels per unit interval (~1.58 bits), not 3 bits per cycle as reported in this article. Although I suppose if you count a cycle as both edges of the clock it's 3.17 bits.


Is I/O starvation the bottleneck with GPUs?

I didn't think it was.


Memory bandwidth is the bottleneck for LLM inference. That's my understanding at least.


That's correct, but also compute to some degree. The larger the model the more of a bottleneck memory becomes.

There are some older HBM cards with very high bandwidth like the Radeon Pro VII which has 1TB/s of bandwidth like the RTX 3090 and 4090, but is notably slower at inference for smaller models since has less compute in comparison. At least I think that was the consensus of some benchmarks people ran.


Isn't it only the case when inference isn't batched?


Even in a local setting, batched inference is useful to be able to run more "complex" workflows (with multiple, parallel LLM calls for a single interaction).

There is very little reason to optimize for just single stream inference at the expense of your batch inference performance.


With a typical transformer and a GPU the batch size that saturates the compute is at least hundreds. Otherwise (including typical size of 1 for local inference) you're memory bound.


GPUs have much more memory bandwidth than CPUs. Meanwhile, the ALU:bandwidth ratio of both GPUs and CPUs has been growing exponentially since the 90s at least. So, the FLOPs per byte required to not be starved on memory is really large at this point. We’re at a point that optimization is 90% about SRAM utilization and you worry about the math maybe at the last step.


For inference, it often is. Though for most consumer parts the bigger concern is not having enough VRAM rather than the VRAM not being fast enough. Copying from system RAM to VRAM is far slower.


Not even a mention of Blockchain


Can’t attract VC money with it anymore


That's good IMO. So much money wasted over the past decade.


[flagged]


I wasn't even talking about the people "investing" in crypto. Just the VC / business side of things.

Just a massive waste, on people who had just about no plan going in other than "disrupt the status quo" and "decentralized".


If 5090 comes with 32GB of this RAM. That should be substantial boost over 4090!! Hope that isn't reflected in the price.


Nvidia: You're getting 2GB of VRAM and you're gonna act like you like it!


> Hope that isn't reflected in the price.

lmao

AMD has even officially announced at this point that they will not compete on high-end consumer and workstation GPUs for years to come. Intel can’t (Gaudi is not general-purpose, so too limited appeal for that market).


"With this new encoding scheme, GDDR7 can transmit “3 bits of information” per cycle, resulting in a 50% increase in data transmission compared to GDDR6 at the same clock speed."

Sounds pretty awesome. I would think that it's going to be much hard to achieve the same clock speeds.


If it could really do that, then it wouldn't be DDR, right?


DDR just says symbols are centered on (both) edges, doesn’t say what the symbols are.


So it's almost twice the performance? That's great. But AI could actually easily use 10 times.

Anyone heard anything about memristors being in a real large scale memory/compute product?


Trying to figure out how this compares to HBM3/e


What does "48 Gigatransfers per second (GT/s)" mean?


It reflects the data rate. Since DDR memory transfers data on both the up and down part of the clock signal, DDR RAM on a 3000Mhz clock signal is said to make 6000 Megatransfers per second, in normal usage. 48 GT/s would imply a 24Ghz clock if it were normal DDR, which seems absurd.

Edit: It seems GDDR6 is in reality "quad data rate" memory, and GDDR7 packs even more bits in per clock using PAM3 signaling, so if I'm reading this right maybe they're saying the chips can run at up to 8Ghz base clock? 8ghz * 6 bits per cycle * 32 bit bus / 8 bits per byte = 192GB/s.

Edit again: It seems I undercounted the number of bits/pin per cycle of base clock for GDDR7 and it's more like 12 (so 4ghz max base clock) or even 24 (so 2ghz), which seems a lot more reasonable.


Gigatransfers per second (GT/s) measures the rate of data transfers rather than the data rate itself. Each "transfer" represents one unit of data moved across a data bus per clock cycle, which can be thought of as one signal transition on the bus.


Well, yeah

Any bets on when it gets renamed AIDDR? Only partly joking


More like NeuralRAM? We have precedents. Back in the 90s, Sun and Mitsubishi came up with 3DRAM, which replaced the RMW cycle in Z-buffering and alpha blending with a single (conditional) write, moving the arithmetic into the memory chips.


DDR with Copilot!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: