Hacker News new | past | comments | ask | show | jobs | submit login

Awesome project, starred! Here are some other projects for agentic browser interactions:

* Cerebellum (Typescript): https://github.com/theredsix/cerebellum

* Skyvern: https://github.com/Skyvern-AI/skyvern

Disclaimer: I am the author of Cerebellum




Thanks man, starred yours too, it's super cool to see all these projects getting spun up!

I see Cerebellum is vision only. Did you try adding HTML + screenshot? I think that improves the performance like crazy and you don't have to use Claude only.

Just saw Skyvern today on previous Show HNs haha :)


I had an older version that used simplified HTML, and it got to decent performance with GPT-4o and Gemini but at the cost of 10x token usage. You are right, identifying the interactable elements and pulling out their values into a prompt structure to explicitly allow the next actions can boost performance, especially if done with grammar like structured outputs or guidance-llm. However, I saw that Claude had similar levels of performance with pure vision, and I felt that vision + more training would beat a specialized DOM algorithm due to "the bitter lesson".

BTW I really like your handling of browser tabs, I think it's really clever.


Fair, also Claude probably only gets better on this since they kinda want people to use Computer use. We are gonna try to do best of both worlds.

Thanks man, Magnus came up with it this morning haha!


I starred both of you




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: