Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Since agents are good only at greenfield projects, the logical conclusion is that existing codebases have to be prepared such that new features are (opinionated) greenfield projects - let all the wiring dangle out of the wall so the intern just has to plug in the appliance. All the rest has to be done by humans, or the intern will rip open the wall to hang a picture.




Hogwash. If you can't figure out how to do something with project Y from npm try checking it out from Github with WebStorm and asking Junie how to do it -- often you get a good answer right away. If not you can ask questions that can help you understand the code base. Don't understand some data structure which is a maze of Map<String, Objects>(s) it will scan how it is used and give you draft documentation.

Sure you can't point it to a Jira ticket and get a PR but you certainly can use it as a pair programmer. I wouldn't say it is much faster than working alone but I end up writing more tests and arguing with it over error handling means I do a better job in the end.


> Sure you can't point it to a Jira ticket and get a PR

You absolutely can. This is exactly what SWE-Bench[0] measures, and I've been amazed at how quickly AIs have been climbing those ladders. I personally have been using Warp [1] a lot recently and in quite a lot of low-medium difficulty cases it can one-shot a decent PR. For most of my work I still find that I need to pair with it to get sufficiently good results (and that's why I still prefer it to something cloud-based like Codex [2], but otherwise it's quite good too), and I expect the situation to flip over the coming couple of years.

[0] https://www.swebench.com/

[1] https://www.warp.dev/

[2] https://openai.com/index/introducing-codex/


How does Warp compare to others you have tried?

I've not used it for long enough yet for this to be a strong opinion, but so far I'd say that it is indeed a bit better than Claude Code, as per the results on Terminal Bench[0]. And on a side note, I quite like the fact that I can type shell commands and chat commands interchangeably into the same input and it just knows whether to run it or respond to it (accidentally forgetting the leading exclamation mark has been a recurring mistake for me in Claude Code).

[0] https://www.tbench.ai/


What you describe is not using agents at all, which my comment was aimed at if you read the first sentence again.

Julie is marketed as an “agent” and it definitely works harder than the Jetbrains AI assistant.

They’re not. They’re good at many things and bad at many things. The more I use them the more I’m confused about which is which.

They are called slot machines for a reason.

I think agents have a curve where they're kinda bad at bootstrapping a project, very good if used in a small-to-medium-sized existing project and then it goes downhill from there as size increases, slowly.

Something about a brand-new project often makes LLMs drop to "example grade" code, the kind you'd never put in production. (An example: claude implemented per-task file logging in my prototype project by pushing to an array of log lines, serializing the entire thing to JSON and rewriting the entire file, for every logged event)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: