Hacker News new | past | comments | ask | show | jobs | submit login

This looks interesting. I am really impressed with MultiOn [0], and I tried to make something similar, but it's quite challenging doing it with a Chrome extension.

I also saw one doing Captcha solving with Selenium [1].

I will keep an eye on your development, good luck!

[0] https://www.multion.ai/ [1] https://github.com/VRSEN/agency-swarm




I actually built a Chrome extension that runs Claude computer use if you’d like to try it out! [0] It’s currently awaiting approval in the Chrome Web Store.

After having spent the last several years building a popular Chrome extension for browser automation [1], I was excited to see if LLMs could actually build automations end-to-end based on a high-level description. Unfortunately, they still get confused quite easily so the holy grail has yet to come. Still fun to play around with though!

[0] https://autobrowser.ai/

[1] https://news.ycombinator.com/item?id=29254147


Thanks! Have you tried captcha solving with [1]? It's very tricky sometimes, especially with non standard "verify human" - maybe you could solve it by writing Selenium/Javascript code directly and then execute it.


I haven’t but watched a video doing it with this framework.

With captcha, worst case scenario is using a service to do it as part of the agent flow. See 2captcha service


Will def try it.


what are the challenges with the Chrome extension path?


You need to call an API to screenshot the page, then figure out the JavaScript code to execute it. It’s not as easy as it might sound.

Playwright and selenium automate the browser itself, but with the chrome extension you need to use the context of the current browser.

I’m not an expert in browser automation so found it challenging moving from playwright to make it completely browser based.


I don’t know a lot about this but do you have full power of Selenium or not? That would be also very interesting aproach especially when “local” browser models get very good


From 3 days playing around it, I couldn’t find a way to use selenium or playwright in the browser.

What I did though is having a loop to send instructions from playwright.

For instance, I will open the browser, and then enter a loop to await for instructions (can be from event such as redis) to execute again in the same browser. But still, it’s based on the session instantiated by playwright.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: