Hey HN,
I made Browser-Use, an open-source tool that lets (all Langchain supported) LLMs execute tasks directly in the browser just with function calling.
It allows you to build agents that interact with web elements using natural language prompts. We created a layer that simplifies website interaction for LLMs by extracting xPaths and interactive elements like buttons and input fields (and other fancy things). This enables you to design custom web automation and scraping functions without manual inspection through DevTools.
Hasn't this been done a lot of times?
Good question, as a general SaaS tool yes, but I think a lot of people are going to try to make their own web automation agents from scratch, so the idea is to provide groundwork/library for the hard part so that not everyone has to repeat these steps:
- parse html in a LLM friendly way (clickable items + screenshots)
- provide a nice function calls for everything inside the browser
- create reusable agent classes
What this is NOT? An all knowing AI agent that can solve all your problems.
The vision: create repeatable tasks on the web just by prompting your agent and not care about the hows.
To better showcase the power of text extraction we made a few demos such as:
- Applying for multiple software engineering jobs in San Francisco
- Opening new tabs to search for images of Albert Einstein, Oprah Winfrey, and Steve Jobs
- Finding the cheapest one-way flight from London to Kyrgyzstan for December 25th
I’d be interested in feedback on how this tool fits into your automation workflows. Try it out and let me know how it performs on your end.
We are Gregor & Magnus and we built this in 5 days.