This actually gives me a funny idea, IDK if it would actually work.
Something interesting you can do with LLMs is you can constrain their output. What if you just write a test case to encapsulate the bug then try to filter out the most "sane" LLM output that results in the code compiling and the test case passing?
It doesn't seem like much, but it's like using more computing power to filter out all the "obviously wrong" solutions. Doing this in practice might require you to make too many test cases though.
If you manage to have more breakthroughs to senior then you can maybe leave it unattended for a few months but not if the product team is inexperienced or strong willed. The code will still end up as unmaintainable spaghetti.
So you need more breakthroughs to achieve staff or higher which requires a lot more agency and initiative on the part of the bot to push past what your team mates think they want and do what actually needs to be done modifying not only the code and the requirements but the process that leads to miscommunication and inefficiencies within the org.
If you manage to invent a bot that knows what’s better for the people prompting it and acts on that then you’ve potentially also violated the first law of robotics and created a death bot.
And boy oh boy would it be astronomically expensive to run.
Something interesting you can do with LLMs is you can constrain their output. What if you just write a test case to encapsulate the bug then try to filter out the most "sane" LLM output that results in the code compiling and the test case passing?
It doesn't seem like much, but it's like using more computing power to filter out all the "obviously wrong" solutions. Doing this in practice might require you to make too many test cases though.