This looks interesting. I am really impressed with MultiOn [0], and I tried to make something similar, but it's quite challenging doing it with a Chrome extension.
I also saw one doing Captcha solving with Selenium [1].
I will keep an eye on your development, good luck!
I actually built a Chrome extension that runs Claude computer use if you’d like to try it out! [0] It’s currently awaiting approval in the Chrome Web Store.
After having spent the last several years building a popular Chrome extension for browser automation [1], I was excited to see if LLMs could actually build automations end-to-end based on a high-level description. Unfortunately, they still get confused quite easily so the holy grail has yet to come. Still fun to play around with though!
Thanks! Have you tried captcha solving with [1]? It's very tricky sometimes, especially with non standard "verify human" - maybe you could solve it by writing Selenium/Javascript code directly and then execute it.
I don’t know a lot about this but do you have full power of Selenium or not? That would be also very interesting aproach especially when “local” browser models get very good
From 3 days playing around it, I couldn’t find a way to use selenium or playwright in the browser.
What I did though is having a loop to send instructions from playwright.
For instance, I will open the browser, and then enter a loop to await for instructions (can be from event such as redis) to execute again in the same browser. But still, it’s based on the session instantiated by playwright.
I also saw one doing Captcha solving with Selenium [1].
I will keep an eye on your development, good luck!
[0] https://www.multion.ai/ [1] https://github.com/VRSEN/agency-swarm