> We're still super early, but already these agents are showing flashes of brill...

furyofantares · 2025-07-06T11:09:48 1751800188

This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.

lelanthran · 2025-07-06T11:50:56 1751802656

> This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.

Well, that is why I am skeptical and said

>> I'm still waiting for AI/LLM's to be posing a danger to jobs other than those in software development and the arts.

The goal of this product is admirable but, I feel, lacks some grounding: doing screenshots, then converting those images to text, then processing, then converting that to actions, then converting the actions to input events ... results in 4 separate points of failure. So many points of failure each with a success rate (last I checked) of <90% gives you something stupid like an eventual success rate of 0.9 * 0.9 * 0.9 * 0.9 = 0.66.

The same iterative workflow for software development is pretty much 2 steps: process input, then produce output, with 100% success (or close to it for "output", as it's just rewriting the files according to the processing) and 90% for processing which is why it appears to work so well[1].

I dabbled briefly in this and explored a few different ways of making LLMs use the ERP/business system effectively, and with all the current popular business systems, this is simply not possible with a high enough success rate because those systems have few "structured text" output, and even fewer "structured text" input. In fact, some of them have exactly zero "structured text" input.

To make the most of LLMs in your business system, you're going to need a new one that is primarily text-IO based (structured text, if necessary) and only secondarily GUI-for-humans based.

[1] In truth, using tools is a poor way to extend the reach and grasp of the LLM into the operator's context.

It works well for one mainstream use-case: software development, because then you need less than a dozen tools to automate an entire development iteration (read file, list files, insert into file, run test command, etc).

Try doing that with a mini-ERP type of system; there's just no way to keep a small set of 12 tools that can do any workflow that the operator can do. You'll quickly run into a situation where every prompt request includes tool description for about 500 tool calls.

Agentic automation is working very well for coding, where all the input is structured text, all the output is structured text, and all the changes are structured text.

The only way for ERP, Accounting, etc to ever get to this level of agent-based automation is if the base product itself is completely 100% structured text IO based, with the human-operator interface built on top of that.

atupem · 2025-07-06T14:45:16 1751813116

I respectfully disagree! There's a lot of opportunity behind keyboard + mouse + screen.

In a way Bytebot is a maximalist bet on the growth and improvement of multi-modal LLMs. I firmly believe that in a short period of time, the token cost will drop, while the capability increases (both dramatically). It's still uncertain, which makes it a great asymmetric bet.

We don't do any sort grounding or image conversion, and we offer a handful of tools. I'll go into more detail in my next post.