This looks interesting. I am really impressed with MultiOn [0], and I tried to m...

namukang · 2024-11-05T20:36:43 1730839003

I actually built a Chrome extension that runs Claude computer use if you’d like to try it out! [0] It’s currently awaiting approval in the Chrome Web Store.

After having spent the last several years building a popular Chrome extension for browser automation [1], I was excited to see if LLMs could actually build automations end-to-end based on a high-level description. Unfortunately, they still get confused quite easily so the holy grail has yet to come. Still fun to play around with though!

[0] https://autobrowser.ai/

[1] https://news.ycombinator.com/item?id=29254147

gregpr07 · 2024-11-05T18:13:57 1730830437

Thanks! Have you tried captcha solving with [1]? It's very tricky sometimes, especially with non standard "verify human" - maybe you could solve it by writing Selenium/Javascript code directly and then execute it.

Oras · 2024-11-05T18:39:13 1730831953

I haven’t but watched a video doing it with this framework.

With captcha, worst case scenario is using a service to do it as part of the agent flow. See 2captcha service

gregpr07 · 2024-11-05T19:19:33 1730834373

Will def try it.

aethelingas · 2024-11-05T18:18:41 1730830721

what are the challenges with the Chrome extension path?

Oras · 2024-11-05T18:42:28 1730832148

You need to call an API to screenshot the page, then figure out the JavaScript code to execute it. It’s not as easy as it might sound.

Playwright and selenium automate the browser itself, but with the chrome extension you need to use the context of the current browser.

I’m not an expert in browser automation so found it challenging moving from playwright to make it completely browser based.

gregpr07 · 2024-11-05T19:21:30 1730834490

I don’t know a lot about this but do you have full power of Selenium or not? That would be also very interesting aproach especially when “local” browser models get very good

Oras · 2024-11-05T19:36:03 1730835363

From 3 days playing around it, I couldn’t find a way to use selenium or playwright in the browser.

What I did though is having a loop to send instructions from playwright.

For instance, I will open the browser, and then enter a loop to await for instructions (can be from event such as redis) to execute again in the same browser. But still, it’s based on the session instantiated by playwright.