I think integration into existing IDEs is the wrong form for agentic coding. The...

skydhash · 2025-06-23T13:15:15 1750684515

When I see proposals for this kind of workflow, the one question I have is how will you manage your personal context. When I’m reviewing code by coworker, I’m not seeking to fully understand the code or checking that it’s correct. I’m mostly trying to get a high level understanding and checking for glaring mistakes (code styles, best practices,…). I can get through a lot of PR in a day that way.

For more important stuff, like if it falls under my supervision, I will test the branch and carefully check the implementation. And this for each PR updates. That takes a lot longer.

So I’m wondering, how do you context switch between many agent running and proposing diffs. Especially if you need to vet the changes. And how do you manage module dependencies where an update by one task can subtly influence the implementation by another?

LeafItAlone · 2025-06-23T13:29:26 1750685366

>So I’m wondering, how do you context switch between many agent running and proposing diffs. Especially if you need to vet the changes.

I’m wondering this too. But from what I have seen, I think most people doing this are not really reading and vetting the output. Just faster, parallelized, vibe coding.

Not saying that’s what parent is doing, but it’s common.

stingraycharles · 2025-06-23T14:46:15 1750689975

Yeah. I would like multiple agents because each can be primed with a different system prompt and “clean” context. This has been proven to work, eg with Aider’s “architect” vs “editor” models / agents working together.

For parallel work who want stuff to “happen faster”, I am convinced most of these people don’t really read (nor probably understand) the code it produces.

scuol · 2025-06-23T15:07:32 1750691252

It's basically like having N of the most prolific LoC producing colleagues who don't have a great mental model of how the language works that you have to carefully parse all of their PRs.

Honestly, I've seen too many fairly glaring mistakes in all models I've tried that signal that they can't even get the easy stuff right consistently. In the language I use most (C++), if they can't do that, how can I trust them to get all the very subtle things right? (e.g. very often they produce code that holds some form of dangling references, and when I say "hey don't do that", they go back to something very inefficient like copying things all over the place).

I am very grateful they can churn out a comprehensive test suite in gtest though and write other scripts to test / do a release and such. The relief in tedium there is welcome for sure!

jbentley1 · 2025-06-23T13:34:22 1750685662

I tried to make it easy to remember what you are doing. You can see the prompts you ran, and I used the Monaco editor from VSCode to view and edit the diffs.

I think there are opportunities to give special handling to the markdown docs and diagrams Claude likes to make a long the way to help review.

EGreg · 2025-06-23T14:00:30 1750687230

Why don’t you automate this checking with AI? You can then cover hundreds of PRs a day.

Voloskaya · 2025-06-23T14:26:00 1750688760

> You can then cover hundreds of PRs a day.

I would argue you haven't covered any.

Why not just skip the reviews then? If you can trust the models to have the necessary intelligence and context to properly review, they should be able to properly code in the first place. Obviously not where models are at today.

EGreg · 2025-06-23T15:18:06 1750691886

Not necessarily. It's like the Generative Adversarial Network (GAN). You don't just trust the generator, but it's a back-and-forth between the Generator and Discriminator.

Voloskaya · 2025-06-23T17:36:39 1750700199

The discriminator is trained on a different objective than the generator, it's specifically trained on being good at discriminating, so it is complimentary.

Here we are talking about the same model doing the review (even if you use a different model provider, it's still trained on essentially the same data, with the same objective and very similar performances).

We have had agentic systems where one agent checks the work of another since 2+ years, this isn't a paradigm pushed by AI coding model providers because it doesn't really work that well, review is still needed.

derwiki · 2025-06-23T14:05:53 1750687553

Turtles all the way down. We seem to be marching towards a future like that, but are we there today? Some of the AI-generated PRs I’ve seen teammates put out “work” (because sometimes two wrongs make a right) but convince me we still need a human in the loop.

But that was two weeks ago; maybe it’s different today

jbentley1 · 2025-06-23T15:37:17 1750693037

The other replies are correct that right now you need some level of human review, but it would be interesting to have a second AI review with a clean context. Maybe a security checklist, or a prompt telling it to check that the tests are covering the functionality appropriately.

Etheryte · 2025-06-23T12:41:55 1750682515

There's no reason you couldn't do the same thing as an IDE plugin.

jbentley1 · 2025-06-23T13:32:39 1750685559

Yes there is. IDEs just aren't designed for it. The main screen in an IDE is a single branch at a time, I want to be managing a swarm of agents on multiple branches/worktrees

Etheryte · 2025-06-24T10:27:04 1750760824

You don't need IDE support for this, it's all Git under the hood. Your extension can hold virtual branches in memory in the background, feed the file contents to the LLM through that layer and back, and the only problem you need to deal with after the fact is how to resolve conflicts, but the LLM would also be a good candidate to handle that. The more I think about it, the more Git makes this a straightforward implementation compared to say SVN, since branches cost nearly nothing. All of this is not to say that it's a trivial piece of work, but it is very much doable without building a new IDE from scratch.

radicalbyte · 2025-06-23T13:39:43 1750685983

That needs isolation, which in practise means multiple machines..

derwiki · 2025-06-23T14:07:27 1750687647

Why machines? Multiple clones of the same repo is one low tech way to achieve that.

brulard · 2025-06-23T15:58:15 1750694295

If we're talking for example full stack JS/TS app, wouldn't you need a separate build/dev server running, database and likely more?

naasking · 2025-06-23T14:13:06 1750687986

I don't see why you necessarily need multiple machines, just multiple checkouts, one for each agent. Depends on what shared resources are involved, eg. databases, etc.

int_19h · 2025-06-24T06:10:32 1750745432

Why not multiple IDE windows then?

mindwok · 2025-06-24T01:46:57 1750729617

I personally disagree. I use Cursor every day on commercial projects, and while I find background agents cool and useful in some contexts they are more often than not simply a distraction.

My preferred way to vibe code is to lock in on a single goal and iterate towards it. When I'm waiting for stuff to finish, I'm exploring docs or info to figure out how to get closer. Reviewing the existing codebase or changes is also super useful for me to grasp where I'm up to and what to do next. This idea of managing swarms of agents for different tasks does not gel with me, too much context switching and multitasking.

SkyPuncher · 2025-06-23T13:41:20 1750686080

Your tool is cool, but is solves a different issue.

Right now, background agents have two major problems:

1. There is some friction to getting the isolated environment working correctly. Difficulty depends on specifics of each project. Ranging from "select this universal container" to "it's going to be hell getting all of your dependencies working". Working in your IDE pretty much solves that - it's likely a place where everything is already setup.

2. People need to learn how agents build code. Watching an agent work in your IDE while being able to interject/correct them is extremely helpful to long term success with background agents.

Jonovono · 2025-06-23T14:45:28 1750689928

Looks cool! What was your reason for not using the Claude Code TS SDK? Looks like you install the package, but are manually spawning claude commands instead?

Side note: You should look into electron-trpc. it greatly simplifies IPC handling

brulard · 2025-06-23T12:54:07 1750683247

This is nice, I was thinking about needing multiple working trees for different sessions of claude code.

Regarding your webpage - I wish you would vibe away the annoying header coming down every time I scroll just tiny little bit up.

jbentley1 · 2025-06-23T13:05:28 1750683928

Noted! Thanks

OtherShrezzing · 2025-06-23T13:20:08 1750684808

For Anthropic, they’ve got to put their product where their customers are. If they’re all in a cli or IDE, then the correct place to put agenetic coding features is into the cli or IDE.

data-ottawa · 2025-06-23T14:09:48 1750687788

I was just reading the Claude Code recommending that approach this morning.

Having a nice way to manage the work trees sounds great, but the rate limiting still sounds like an issue to this approach.

https://docs.anthropic.com/en/docs/claude-code/common-workfl...

mikojan · 2025-06-23T14:21:32 1750688492

Rate limiting has not been a problem for me. I need time to review the proposals, the actual source code and to meddle with it in between

One must also always be aware that an LLM WILL ALWAYS DO what you ask it for. Often you ask for the wrong thing. And you need to rethink.

Maybe I am inefficient though I really only use at the most two additional work trees at the same time.

brulard · 2025-06-23T16:02:16 1750694536

> ... LLM WILL ALWAYS DO what you ask it for.

What? That's not my experience at all. Especially not "always"

mikojan · 2025-06-24T07:58:56 1750751936

Yes, yes they do. If you ask it to refactor something and integrate it somewhere else; it will do exactly that even if in the course of it you would find that that would dramatically increase complexity not reduce it.

I cannot count how many times that or something like that has happened to me.

brulard · 2025-06-24T19:37:45 1750793865

Most of the time, maybe. Absolutely not always. I'll tell it to "implement feature a, ignore typescript errors". And it happened multiple times for me that it did the exact opposite, fix TS errors, and feature is barely mentioned in the response. Or more recently (with deep research) "Give me list of {some_product_name}, make absolutely sure to make a CSV and output a CSV. Columns are a,b,c,..". Does it give me the data? No, I get a wall of text with absolutely no data. Ok, you may argue this is some agent, etc. but user may not see a difference.

Don't take me wrong, I'm a big fan and constant user of all these things, but I would say it frequently have problem following prompts.

Paradigma11 · 2025-06-24T08:14:54 1750752894

Or reduce complexity: https://xkcd.com/221/

jbentley1 · 2025-06-23T15:39:20 1750693160

If I hit the rate limit in 2 hours and got value out of each prompt I ran, that's better than doing the same amount of work in 6 hours and not hitting the limit.

Personally, I'm running 2 accounts and switching between them for maximum productivity. Just as a function of what my time is worth it is a no brainer.

throwaway314155 · 2025-06-23T15:55:00 1750694100

> The best way to work is managing several Git worktrees with agents running so you aren't stuck waiting 20+ minutes for Claude Code to finish.

Sounds like you're limiting yourself to users who are comfortable paying 100-200$ monthly subscription or even thousands per month for API prices.

C.C. is expensive but i was hoping we weren't going to build tooling that exacerbated this issue simply because for some of us money is less of an issue than for most of us.

jbentley1 · 2025-06-23T16:42:36 1750696956

If you are paying a senior engineer 200k, getting them a CC max plan is equivalent to 1.2% of their salary. I would say that it increases productivity by a lot more than that.

So yes it might feel expensive in terms of a personal monthly budget, but the value for money is insane.

throwaway314155 · 2025-07-01T19:43:06 1751398986

I guess I'd just ask that you re-read my comment. I get it. Drop in the bucket for companies. Not the same situation for students and plethora of other common financial situations. I agree with your take, just with the caveat that more affordable still makes the situation better (for companies too).

jbentley1 · 2025-06-23T16:49:24 1750697364

While I seem to have a little attention from this comment, if anyone can test this Linux installer for Crystal and tell me if it works on their machine I would appreciate it:

https://github.com/stravu/crystal/actions/runs/15791009893/a...

ninthaccountshn · 2025-06-23T20:01:35 1750708895

Basics are working on arch with the AppImage, anything specific?

jbentley1 · 2025-06-23T21:54:43 1750715683

If you can call Claude Code that means everything else should be working, as most functionality is built around the terminal and that is how it is calling Claude Code.

Thanks for your help, now I'll be able to include Linux support in my next release

smrtinsert · 2025-06-23T15:57:55 1750694275

What tasks require parallel workflows like this? Running one claude prompt gives me more than enough to chew on for several hours if done correctly.

4b11b4 · 2025-06-23T14:34:09 1750689249

Seems like Amp would plug into this better? At least regarding the ability for sharing prompts, etc.

artursapek · 2025-06-23T14:19:01 1750688341

When I try to run two CCs at once I quickly get 429 rate limited, even on the $200 plan

andy_ppp · 2025-06-23T14:22:44 1750688564

Maybe the UI should allow you to still ask questions but in a queue to prevent this. It could have informative text like “waiting on 3 previous questions” and a progress bar of some kind.

jbentley1 · 2025-06-23T15:39:49 1750693189

Weird, I have not had this issue and I commonly run 5+ at once

lbeurerkellner · 2025-06-23T15:09:17 1750691357

This looks really cool, thanks for sharing.