sarthaksoni's comments

sarthaksoni · 2025-08-08T07:45:12 1754639112

I’m guilty of this. I’m trying to be more mindful when using LLM-generated code. It’s mostly a personal issue: I tend to procrastinate and hope the code “just works.”

We need to stay vigilant,otherwise we will pay the cost by fixing LLM bugs later.

DontchaKnowit · 2025-08-08T13:22:23 1754659343

First time ive ever heard someone admit this, only ever heard people accuse theur coworkers of it. This is honestly a very sad thing to hear a professional dev say

sarthaksoni · 2025-08-08T19:14:48 1754680488

Sorry, I will try to be better.

sarthaksoni · 2025-08-07T06:42:56 1754548976

Reading this made me realize how easy it is to set up GPT-OSS 20B in comparison. I had it running on my Mac in five minutes, thanks to Llama.

DrPhish · 2025-08-07T06:58:12 1754549892

Its also easy to do 120b on CPU if you have the resources. I had 120b running on my home LLM CPU inference box in just as long as it took to download the GGUFs, git pull and rebuild llama-server. I had it running at 40t/s with zero effort and 50t/s with a brief tweaking. Its just too bad that even the 120b isn't really worth running compared to the other models that are out there.

It really is amazing what ggerganov and the llama.cpp team have done to democratize LLMs for individuals that can't afford a massive GPU farm worth more than the average annual salary.

wkat4242 · 2025-08-07T07:48:32 1754552912

What hardware do you have? 50tk/s is really impressive for cpu.

DrPhish · 2025-08-07T08:45:38 1754556338

2xEPYC Genoa w/768GB of DDR5-4800 and an A5000 24GB card. I built it in January 2024 for about $6k and have thoroughly enjoyed running every new model as it gets released. Some of the best money I’ve ever spent.

testaburger · 2025-08-07T09:29:32 1754558972

Which specific model epcys? And if it's not too much to ask which motherboard and power supply? I'm really interested in building something similar

smartbit · 2025-08-07T10:09:49 1754561389

Looking at https://news.ycombinator.com/submitted?id=DrPhish it's probably this machine https://rentry.co/miqumaxx

  * Gigabyte MZ73-LM1 with two AMD EPYC GENOA 9334 QS 64c/128t
  * 24 sticks of M321R4GA3BB6-CQK 32GB DDR5-4800 RDIMM PC5-38400R
  * 24GB A5000

Note that the RAM price almost doubled since Jan 2024

fouc · 2025-08-07T15:41:21 1754581281

I've seen some mentions of pure-cpu setups being successful for large models using old epyc/xeon workstations off ebay with 40+ cpus. Interesting approach!

wkat4242 · 2025-08-07T09:05:33 1754557533

Wow nice!! That's a really good deal for that much hardware.

How many tokens/s do you get for DeepSeek-R1?

DrPhish · 2025-08-07T14:29:54 1754576994

Thanks, it was a bit of a gamble at the time (lots of dodgy ebay parts), but it paid off.

R1 starts at about 10t/s on an empty context but quickly falls off. I'd say the majority of my tokens are generating around 6t/s.

Some of the other big MoE models can be quite a bit faster.

I'm mostly using QwenCoder 480b at Q8 these days for 9t/s average. I've found I get better real-world results out of it than K2, R1 or GLM4.5.

ekianjo · 2025-08-07T10:01:41 1754560901

thats a r/localllama user right there

SirMaster · 2025-08-07T13:51:29 1754574689

I'm getting 20 tokens/sec on the 120B model with a 5060Ti 16GB and a regular desktop Ryzen 7800x3d with 64GB of DDR5-6000.

wkat4242 · 2025-08-07T20:20:20 1754598020

Wow that's not bad. It's strange, for me it is much much slower on a Radeon Pro VII (also 16GB, with a memory bandwidth of 1TB/s!) and a Ryzen 5 5600 with also 64GB. It's basically unworkably slow. Also, I only get 100% CPU when I check ollama ps, the GPU is not being used at all :( It's also counterproductive because the model is just too large for 64GB.

I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster.

magicalhippo · 2025-08-07T21:29:05 1754602145

AMD basically decided they wanted to focus on HPC and data center customers rather than consumers, and so GPGPU driver support for consumer cards has been non-existing or terrible[1].

[1]: https://github.com/ROCm/ROCm/discussions/3893

wkat4242 · 2025-08-09T22:43:25 1754779405

The Radeon VII Pro is not a consumer card though and works well with ROCm. It even has datacenter "grade" HBM2 memory that most Nvidias don't have. The continuing support has been dropped but ROCm of course still works fine. It's nearly as fast in Ollama as my 4090 (which I don't use for AI regularly but I just play with it sometimes)

exe34 · 2025-08-07T08:42:06 1754556126

I imagine the gguf is quantised stuff?

DrPhish · 2025-08-07T08:46:35 1754556395

No, I’m running the unquantized 120b

amelius · 2025-08-07T09:14:56 1754558096

Why is it hard to set up llms? You can just ask an llm to do it for you, no? If this relatively simple task is already too much for llms then what good are they?

diggan · 2025-08-07T09:34:39 1754559279

In the case of the GPT-OSS models, the worst (time consuming) part of supporting it is the new format they've trained the model with, "OpenAI harmony", in my own clients I couldn't just replace the model and call it a day, but still working on getting then to work correctly with tool calling...

CraigRood · 2025-08-07T09:21:13 1754558473

I was playing with it yesterday and every single session gave me factually incorrect information.

Speed and ease of use is one thing, but it shouldn't be at the cost of accuracy.

OliverGuy · 2025-08-07T10:33:11 1754562791

If you are trying to get facts out of an LLM you are using it wrong, if you want a fact it should use a tool (eg we search, rag etc) to get the information that contains the fact (Wikipedia page, documentation etc) and then parse that document for the fact and return it to you.

CraigRood · 2025-08-10T18:09:33 1754849373

These tools are literally being marketed as AI, yet it presents false information as fact. 'using it wrong' can't be an argument here. I would rather then tool is honest about confidence levels and mechanisms to research further - then feed that fact back into 'AI' for the next step.

LoganDark · 2025-08-07T13:43:41 1754574221

120B is pretty easy to run too, if you have enough memory.

sarthaksoni · 2025-07-29T10:48:42 1753786122

Really cool website! What inspired you to create it?

sarthaksoni · 2025-07-24T03:56:33 1753329393

This personal site makes me genuinely jealous—in the best way.Really awesome side projects and great intro.

Time to raise my own bar.

sarthaksoni · 2025-07-16T05:09:15 1752642555

I’ve used Firefox for years and really wanted to stick with it, but too many sites keep breaking. I originally ditched Chrome because it chewed through my RAM, but on the new M4 MacBook I’ve got headroom, so I’ve reluctantly gone back to Chrome. Painful switch, but I don’t have much choice right now.

fooker · 2025-07-16T05:17:35 1752643055

I have the same experience.

It's somewhat of a taboo around here, and every time I have mentioned this there has been a bunch of responses certifying that Firerox works perfectly for them.

NamTaf · 2025-07-16T11:21:02 1752664862

I genuinely can't think of any sites I come across that are broken, at least visibly enough for me to notice. I think that speaks more to the variety in browsing habits than anything else. I'm sure they exist and I don't think it's a taboo. People who don't share that impression probably just don't visit any of those broken sites, e.g. me.

fooker · 2025-07-17T03:33:51 1752723231

YouTube, Netflix are the most common sites I have had issues with.

ksec · 2025-07-16T07:57:41 1752652661

Which site don't work on Firefox ?

sarthaksoni · 2025-07-16T11:43:18 1752666198

Some forms just break in Firefox for me. I’ve been applying to a lot of tech companies, and roughly 10% of their application forms fail in Firefox but work fine in Chrome. I can’t figure out why it’s inconsistent. Even some CAPTCHA and payment pop‑ups won’t load.

Lord-Jobo · 2025-07-16T23:27:45 1752708465

Most browser games(I play an absolute shit ton) run WAY better on chrome than Firefox, on both macOS and Android.

Not sure about Windows where I play full real games.

jerhewet · 2025-07-16T22:37:58 1752705478

Home Depot. They never test against Firefox, so most of their pages are a dumpster fire.

sarthaksoni · 2025-07-16T05:01:37 1752642097

Great read! As a software engineer sitting here in India, it feels like a privilege to peek inside how OpenAI works. Thanks for sharing!