Hacker News new | past | comments | ask | show | jobs | submit login

I’m doing an experiment in this in real time: I’ve got a bunch of top-flight junior folks, all former Jane and Google and Galois and shit, but all like 24.

I’ve also been logging every interaction with an LLM and the exit status of the build on every mtime of every language mode file and all the metadata: I can easily plot when I lean on the thing and when I came out ahead, I can tag diffs that broke CI. I’m measuring it.

My conclusion is that I value LLMs for coding in exact the same way that the kids do: you have to break Google in order for me to give a fuck about Sonnet.

LLMs seem like magic unless you remember when search worked.




> LLMs seem like magic unless you remember when search worked.

Yikes. I didn’t even think about this, but it’s true.

I’m looking for the kinds of answers that Google used to surface from stack overflow


The best way to get useful answers was (and for me still is) to ask Goggle for "How do I blah site:stackoverflow.com". Without the site filter, Google results suck or are just a mess, and stackoverflow's own search is crap.


Google used to be better but so was stack overflow. Now a lot of the answers are out-dated. And even more importantly they got rid of any questions where the answer was even a little bit subjective. Unfortunately for users that's almost all the most useful answers.


Kagi…

Fully switched over more than a year ago and never looked back.


I had a kagi account for a year, but it’s just bing with some admittedly nice features on top.

I don’t get the results because there’s just not a lot of people talking about what I’m interested in.


Kagi with the "Programming" lense turned on


Nowadays, I just read manuals, docs and books. I mostly use search as a quick online TOC or for that specific errors I’m in no mood to debug.


I don't understand, are you using LLMs purely for information retrieval, like a database (or search index)? I mean sure that's one usecase, but for me the true power of LLMs comes from actually processing and transforming information, not just retrieving it.


I have my dots wired up where I basically fire off a completion request any time I select anything in emacs.

I just spend any amount of tokens to build a database of how 4o behaves correlated to everything emacs knows, which is everything. I’m putting down tens of megabytes a day on what exact point they did whatever thing.


I’m actively data-mining OpenAI, they get a bunch of code that they have anyways because they have GitHub, I get arbitrary scope to plot their quantization or whatever with examples.

Flip it on em. You’re the one being logged asshole.

https://youtu.be/un3NkWnHl9Q?si=VOnH2krJkJLRA2BQ


To be clear I’m a huge fan of the Cursor team: those folks are clearly great at their jobs and winning at life.

They didn’t get ahead by selling you the same thing they do, if they did Continue would be parity.


What domain/type of software do you and they work on? Cursor has been quite effective for me and many others say the same.

As long as one prompts it properly with sufficient context, reviews the generated code, and asks it to revise as needed, the productivity boost is significant in my experience.


Well, the context is the problem. LLMs will really become useful if they 1.) understand the WHOLE codebase AND all it's context and THEN also understand the changes over time to it (local history and git history) and finally also use context from slack - and all of that updating basically in real time.

That will be scary. Until then, it's basically just a better autocomplete for any competent developer.


What you describe would be needed for a fully autonomous system. But for a copilot sort of situation, the LLM doesn't need to understand and know of _everything_. When I implement a feature into a codebase, my mental model doesn't include everything that has ever been done to that codebase, but a somewhat narrow window, just wide enough to solve the issue at hand (unless it's some massive codebase wide refactor or component integration, but even then it's usually broken down into smaller chunks with clear interfaces and abstractions).


I use copilot daily and because it lacks context it's mostly useless except for generating boilerplate and sometimes converting small things from A to B. Oh, also copying functions from stackoverflow and naming them right.

That's about it. But I spend maybe 5% of my time per day on those.


I dislike Copilot's context management, personally, and much prefer populating the context of say Claude deliberately and manually (using Zed, see https://zed.dev/blog/zed-ai). This fits my workflow much much better.


Imagine you are coding in your IDE and it suggests you a feature because someone mentioned it yesterday on #app-eng channel. Needs deeper context, though. About order of events, an how authoritative a character is.


I get value out of LLMs on stock Python or NextJS or whatever where that person was in fact a lossy channel from SO to my diff queue.

If there’s no computation then there’s no computer science. It may be the case that Excel with attitude was a bubble in hiring.

But Sonnet and 4o both suck at why CUDA isn’t detected on this SkyPilot resource.


> But Sonnet and 4o both suck at why CUDA isn’t detected on this SkyPilot resource.

I don't understand this sentence, should "both suck at why" be "both suck and why" or perhaps I'm just misunderstanding in general?


SkyPilot is an excellent piece of software attempting an impossible job: run your NVIDIA job on actively adversarial compute fabric who mark up the nastiest monopoly since the Dutch East India Company (look it up: the only people to run famine margins anywhere near NVIDIA are slave traders).

To come out of the cloud “credits” game with your shirt on, you need stone cold pros.

The kind of people on the Cursor team. Not the adoring fans who actually use their shit.





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: