I'm wondering if they are using vanilla claude or if they are using a fine-tuned version of claude specifically for browser use.
RL fine-tuning LLMs can have pretty amazing results. We did GRPO training of Qwen3:4B to do the task of a small action model at BrowserOS (https://www.browseros.com/) and it was much better than running vanilla Claude, GPT.
RL fine-tuning LLMs can have pretty amazing results. We did GRPO training of Qwen3:4B to do the task of a small action model at BrowserOS (https://www.browseros.com/) and it was much better than running vanilla Claude, GPT.