Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought of something similar these days but with a different approach - rather than settrace, it would use a subclass of bdb.Bdb (the standard library base debugger, on top of which Pdb is built) to actually have the LLM run a real debugging session. It'd place breakpoints (or postmortem sessions after an uncaught exception) to drop into a repl which allows going up/down the frame stack at a given execution point, listing local state for frames, running code on the repl to try out hypotheses or understand the cause of an exception, look at methods available for the objects in scope, etc. This is similar to what you'd get by running the `%debug` magic on IPython after an uncaught exception in a cell (try it out).

The quick LLM input/repl output look is more suitable for local models though, where you can control hidden state cache, have lower latency, and enforce a grammar to ensure it doesn't go off the rails/commands implemented for interacting with the debugger, which afaik you can't do with services like OpenAI's. This is something I'd like to see more of - having low level control of a model gives qualitatively different ways of using it which I haven't seen people explore that much.



So interestingly enough, we first tried letting GPT interact with pdb, through just a set of directed prompts, but we found that it kept hallucinating commands, not responding with the correct syntax and really struggling with line numbers. That's why we pivoted to just getting all the relevant data upfront GPT could need and letting GPT synthesize that data into a singular root cause.

I think we're going to explore the local model approach though - you raise some really great points about having more granular control over the state of the model.


Interesting! Did you try the function calling API? I feel you with the line number troubles, it's hard to get something consistent there. Using diffs with GPT-4 isn't much better in my experience; I didn't extensively test that, but from what I did it rarely produced synctatically valid diffs that could just be sent to `patch`. One approach I started playing with was using tree-sitter to add markers to code and let the LLM specify marker ranges for deletion/insertion/replacement, but alas, I got distracted before fully going through with it.

In any case, I'll keep an eye on the project, good luck! Let me know if you ever need an extra set of hands, I find this stuff pretty interesting to think about :)


I actually coded something very close to this and it worked surprisingly well: https://github.com/janpf/debuggAIr


Ooh, interesting - starred and going to dig into this later today!


I've done a manual version of this with chatgpt.

I had ipdb, told it to request any variables that I should look at, suggest what to do next, what it would expect - it was quite good, but took a lot of persuading, just having an LLM that was more tuned to this would be better.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: