Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: We're building a desktop app for browser-based AI agents (meha.ai)
50 points by jawerty 85 days ago | hide | past | favorite | 56 comments
What's up HN!

This is Jared and Art. We met on HN and started building together.

Over the last few months we've been thinking a lot about how AI agents are going to impact the future. We want agents to be something that's actually useful for normal people as well as the 10x'ers. This lead us to building Meha over the last few months, our first swing at our vision! We saw OpenAI release Operators then we said f*k it let's post.

Meha is a desktop app that uses your Chrome browser to execute tasks in the background. It controls your installed Chrome browser and uses LLMs with playwright to plan and execute actions to accomplish your task. You get to see each planning step the bot is doing and have access to its long term memory.

Meha also uses its own file system and can export files for download. Another thing we've been focused on in multi-agent workflows and Meha can run many bots at the same time. One of the reasons why we can ship this for free in the mean time is because of how cheap the agents are. But we are planning to have a Pro version for power users. We prefer not to raise since we're against VC funding.

We have been influenced by a lot of concepts in probabilistic robotics and RL to develop a fairly robust 'agentic' framework. As well as an algorithm for efficiently converting/compressing large html pages into a semantic format. If you're interested we will open source this asap in an SDK (will work with all OpenAI API spec LLMs and with llama.cpp) let us know.

We're currently in beta and working on figuring out what this product will become and super stoked! Let us know what you think. To get access to Meha we have links on our discord to download (Both MacOS and Windows is available). Please give us all the feedback/criticism (even if you hate AI).

Link to Meha: https://meha.ai




> As well as an algorithm for efficiently converting/compressing large html pages into a semantic format.

For the love of humanity please open source this. This seems tremendously useful by itself.


There is an open source alternative that might be even better: https://playwright.dev/docs/api/class-locator#locator-aria-s....


Oh damn I will definitely look into open sourcing it and making it a sdk


Awesome! I write LLM powered scrapers and stuff all the time and one of the biggest pain points is HTML is full of so much crap that isn't meaningful and overwhelms the context. And being a data science guy idk how to solve this.


awesome that's the same reason why I use it. It's basically a balance between the full html and having the markdown type scrapers that are better for just text. Do you mind if I reach out to you once I set up the Github?


You're very welcome to! Please do. You can reach out to notpricedinyet@gmail.com


Looked through their privacy policy, and they state the collect and use basically everything they can from your browser & system metadata, to the content you share and/or create. Not that different from every other attempt in the frothy AI space, but a real turn-off and hard no for me.


Thank you for the feedback. Personally besides using our API server, we would like to find another way to deploy to anyone who has an issue with this/wants to run everything local (not just the client). Also I think if we had a OSS plug and play version where you could enter in your API keys locally it would help us ship to more devs. Would you be interested in this?


I'm so impressed with the concept of this agent but sorry, I can't have you accessing all my corporate data and systems because I access them via browser.

Perhaps you could create both a Public and Corporate version of the extension, like Copilot does. The Corporate version could have access to all browser data but not share it beyond the bounds of the company.


Thanks! That’s a great point we’ve been discussing how to deal with sensitive data after the launch. I think a corporate/enterprise version makes sense.


Some analysis I've been reading on the implications of DeepSeek says that model optionality is probably here to stay. If so, I think incorporating model choice would be a valuable aspect of this kind of product. Conversely, I agree with parent: I'm not installing this software with that privacy policy in place.


We definitely wouldn't mind adding that. We are open to a lot of ideas and will consider everything! Appreciate your input so thank you.


Op, any comment on this?


I asked it to go on seloger.com, to find "some flats on paris below 400k". It went on some specific district of Paris, and didn't put a price citeria then responded how I could do it myself.

I then asked to create a CSV of the first 100 flats corresponding to my criteria, it created only 3 entries, purely hallucinated.


We'll take a look and see if we can get those prompts working. Thanks for letting us know!


Hey I am also building in the space and launched rtrvr.ai, but we went the route of a Chrome Extension so people don't have to worry about installing random software on their devices [also the reason that I am hesitant to try this out].

But let me know your thougths on rtrvr.ai, looks like we are targeting the same use cases of automation, scraping, research?


Hi everyone, this is Art!

Happy to hear all the thoughts for those who try the app out! Even if you just have ideas about how agents might look in their final form, there's so many avenue's this tech can take and we have a ton of wild ideas we'll be building so stay tuned. :D


Very cool! Any video demos for sample tasks? I didn't come across any on the website (browsing on mobile).


Those are still in the cooker, we'll throw them up asap once they're ready.

Some demos we will have are:

- Logging into twitter and tweeting

- Finding information from google maps of any nearby business whether that's for leads or finding local restaurant options.

- Scraping anything from wikipedia like current events etc.

- And more!


Those are good ones. I've fiddled with similar systems before, do you have a rough success rate? I know they can be finicky, especially as you execute through a chain-of-thought action plan, or however you're doing it.


Anything improving reasoning chains of though improves planning. Right now the long term ones Art mentioned like logging in have been around 80% while simpler ones have been higher. Right now our main issue is figuring out how to keep the server up :/ we're getting a little more traffic than expected. However, to bump those success rates up (which we need to) we really really need to fine tune additional models which we're planning out right now.

I have a few ideas around that mostly going down the RL route (with a twist) mixed with some knowledge graph work. We'll give an update when we push that!


> keep the server up

Oh maybe I didn't understand from the site - it's not a standalone desktop app? What processing do you do on your server side?


We have an API server where we execute all the agent reasoning/planning jobs then we stream the browser commands to the client. We mention this in the how it works section on the website. This is the main reason why we have the 5 bot a day limit is because of this. It's cheap for us to run as of now but if anyone would like us to ship a version where you'd use your own api keys (plug n play) locally let us know!


Interesting idea. With the web scraping utility, do I need to specify which websites I wish for the api to scrape from or do I essentially just say, "hey I want this data, go get it"?

If it's the latter, how do you go about making sure you're not about to download malicious data to my machine?


Great question, so right now you can do both. It does work better if you simply enter in the url for your task.

For the url generation we do we have safety checks for the urls however it's simply in the prompting. I would love to hear what sort of safety suggestions you have and/or concerns about this sort of experience. Right now we're still figuring out how best to enable people to utilize agents safely.


Cool. Is this a wrapper for https://github.com/browser-use/browser-use ?


It is not, Meha agent is fully custom except we don’t use our own models we’re using o3-mini for most of the inference


it kept taking me to non existent websites that were a summary of what i asked it for


Thanks for letting us know can you email us at info@meha.ai or dm one of us on discord with more information we're working through all the bug reports in the next couple days.


I would be very interested in your research on compressing HTML pages!


Great! I will work on open sourcing that on our Github. It's basically a semantic format of html for AI agents to use the browser easily.


Is it a native app or electron based?


This is a python QT app (for now) we're lookin to move to electron however packaging this has been...interesting.


Please please please don’t move to electron and just build something native. Electron desktop software bloat is killing our machines.


Packaging a browser runtime for a chat app is a concern when the base amount of resources is far more than what we need. I'm more concerned about dev community + what runtime we'd prefer managing local browser in. I'm looking to Go frameworks right now (I'm naturally moving away from python to Go personally) if anyone has any suggestions


I would be so happy if the migration went from pyqt to c++ qt.

Hell, I’ll help with the conversion.


We will probably be open sourcing the frontend soon and you can hack on it all you want. :D


Please stick with Qt, I have ditched all non native and electron apps from my machine (the last replacement was VSCode to Zed)


Interesting can you lmk why you ditched all non native? We're discussing what decision to make on this now.


So the reason for no Linux support is non technical right? There are literally dozens of us!

An easy way to scrape webpages is something I’m interested in, I promise I’ll try it when it’s supported.


Oh please do report that on either the discord or email us. We had a few people request linux support; trying to log all the feedback. It's really just time limitations right now no prejudice :)


How do I uniquely identify Meha to block it?


Sooo make your html extremely convoluted, randomized semantics, and a ton of hidden interations (+1 for only using custom web elements). basically make it like youtube. After spending way too much time building browser agents I can assure you this will also defeat Operator as well.


Can you email me at the email in my profile? I'd like to talk with you.


For sure just sent!


When making requests, does your tool use the normal chrome user agent header or does it specify the request is coming from meha?


Since you asked for “all the feedback,” there’s a typo on your landing page:

“The Meha API utilizes it's home-grown” -> “its”

Also, I got a relay access denied error when I tried to email you at info@meha.ai


Awesome we just fixed these issues thanks for letting us know.


The headline needs another word. Maybe browser-based agents. I assumed this was about browser user-agents.


maybe browser AI agent? Good note we didn't think of that.


I've added both :)


Irrespective of the product, I think that you could have posted without the pseudo cussing. Having a little respect for your audience, and trying to appear professional, goes a long way in attracting users.


your view is not in line with cultural norms, though. if nytimes best selling book titles regularly use "pseudo cussing" (as you call it), no reasonable person would see this as disrespectful, especially given the much more casual context it was used.


Book titles have the benefit of artistic expression.


[dead]


[flagged]


It's all good! We're were mostly joking there I'm sure you have valid criticisms.


[flagged]


Feels like you're being pretty negative to some kind folks showing off a cool thing they built.

HN is a place for people to show and tell with positive intention and constructive feedback. I don't think OP is scoffing at all :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: