Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure if you guys know: Groq already doing this with their ASIC chips. So... the already passed FPGA phase and is on ASICs phase.

The problem is: seems that their costs is 1x or 2x from what they are charging.



Probably more than 2x...

"Semi analysis did some cost estimates, and I did some but you’re likely paying somewhere in the 12 million dollar range for the equipment to serve a single query using llama-70b. Compare that to a couple of gpus, and it’s easy to see why they are struggling to sell hardware, they can’t scale down.

Since they didn’t use hbm, you need to stich enough cards together to get the memory to hold your model. It takes a lot of 256mb cards to get to 64gb, and there isn’t a good way to try the tech out since a single rack really can’t serve an LLM."

https://news.ycombinator.com/item?id=39966620


Groq is unpredictable and while it might be fast for some requests about it's super slow or fails on others.

Fastest commercial model is Google's Gemini Flash (predictable speed)


The way I see it, is that one day we'll be buying small LLM cartridges.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: