Launch HN: Integuru (YC W24) – Reverse-engineer internal APIs using LLMs

dewey · 2024-10-29T13:59:37 1730210377

If your landing page doesn't look like this, you've launched too late: https://integuru.ai

bryant · 2024-10-29T16:49:11 1730220551

Page source is amazing. I can't remember the last time I've seen a serious YC company launch page with absolutely zero JavaScript. Even the CSS is just a single selector.

I'm a fan.

ocean_moist · 2024-10-29T15:45:35 1730216735

I wish I could do this… best part of building for devs is being able to provide simple, good UX with minimal UI.

geoctl · 2024-10-29T16:00:06 1730217606

Still looks more interesting than that Next.js landing page template used by every startup these days.

silvanocerza · 2024-10-29T16:59:08 1730221148

Their website is this one though. :) https://www.taiki.ai/

swyx · 2024-10-29T20:53:07 1730235187

@richardzhang what is the relationship between taiki and integuru? is this a pivot?

richardzhang · 2024-10-29T21:02:22 1730235742

We should definitely further clarify this! We built Integuru as an internal tool while building the products for Taiki. Then we realized that other developers may need the agent, too, so we decided to open-source Integuru. In terms of the current focus for our team, we are spending most of our time on Integuru because newly requested integrations take some of our resources to build, and we want to continue improving the agent. I think the correct way to frame this is a market expansion, where we're expanding beyond the tax industry.

ramenlover · 2024-10-29T16:46:18 1730220378

I don't know what my PM would say but to me this is "excellent and appealing design"

btbuildem · 2024-10-29T19:07:47 1730228867

This is what happens when your daily grind is cutting through all kinds of atrocious and excessive "web design" in order to get at information.

qsort · 2024-10-29T14:15:55 1730211355

Literally peak graphics.

shmatt · 2024-10-29T14:21:25 1730211685

I just noticed over the weekend new Claude agreed to reverse engineer a graphql server with introspection turned off, something Im pretty sure it would have refused for ethical reasons before the new version

it kept writing scripts, i would paste the output, and it would keep going, until it was able to create its own working discount code on an actual retail website

The only issue with these kinds of things is breaking robots.txt rules and the possibility things will break without notice, and often

The use of unofficial APIs can be legally questionable [1]

[1] https://law.stackexchange.com/questions/93831/legality-of-us...

As the authors of essentially a hacking tool, I would expect at least some legal boilerplate language about not being liable

richardzhang · 2024-10-29T15:03:18 1730214198

We are working on a way to auto-patch internal APIs that change by having another agent trigger the requests.

Regarding the legality aspects — really appreciate you mentioning this — we’ve put a lot of thought into these issues, and it’s something we’re continually working on and refining.

Ultimately, our goal is to allow each developer to make their own informed decision regarding the policies of the platforms that they're working with. There are situations where unofficial APIs can be both legal and beneficial, such as when they're used to access data that the end user rightfully owns and controls.

For our hosted service, we aim to balance serving legitimate data needs with safeguarding against bad actors, and we’re fully aware this can be a tricky line to navigate. What this looks like in reality would be to prioritize use cases where the end-user truly owns the data. But we know this is not always black-and-white, and will come up with the right legal language as you recommended. What does help our case is that many companies are making unofficial APIs for their own purposes, so there are legal precedents that we can refer to.

shmatt · 2024-10-29T15:28:12 1730215692

I have to disagree, it is definitely not legal in the US to use unauthorized access points to access authorized data. Thats like saying you're allowed to get into your apartment through breaking your neighbors door and climbing between the windows

In the US this is pretty simply covered by Computer Misuse Act and Computer Fraud and Abuse Act, both federal laws

Im not claiming you're liable, just surprised no lawyer pointed this out at YC

conradev · 2024-10-29T18:01:10 1730224870

There is a carve out if the data is "publicly available": https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

If I open the Safeway app and it fetches what is available in a given store without any authentication and everyone sees the same data, that could possibly fall under that exemption.

chatmasta · 2024-10-29T20:52:58 1730235178

If my browser is downloading some data, then what’s the difference if my AI agent is doing the same? I’ll even tell you it’s my browser. Who are you to say what qualifies as a browser?

anon291 · 2024-10-30T03:03:01 1730257381

The law will say what qualifies as a browser.

Computer programmers are not legal experts lol. The law is not a program.

The difference between you accessing it and a computer accessing it makes these things different.

chatmasta · 2024-10-30T03:20:24 1730258424

A browser is a user agent, it's some software that makes requests to a server and renders them in a way I can understand. There's no difference between using a screen reader to vocalize content and using an AI agent to summarize it.

anon291 · 2024-10-30T06:13:46 1730268826

Sigh and now you're arguing with me instead of the law, as if I matter.

Bits have color and if you don't know what that means, Google that before responding.

instakill · 2024-10-31T07:36:20 1730360180

I think this is the reference but I'm too lazy to get a TLDR https://news.ycombinator.com/item?id=24917679

kgc · 2024-10-30T03:23:41 1730258621

Just have the AI use the browser.

nkrisc · 2024-10-30T13:30:33 1730295033

Likely a judge or jury will decide. Law isn’t code.

If it’s two different things then it’s not the same thing.

daveguy · 2024-10-29T16:53:17 1730220797

This analogy is completely off. A closer analogy is someone calls you on your phone letting you know they're here. You were expecting them, so you say "come on in." But, they were at the back door instead of the front door. I don't think anyone would consider that your friend did something illegal.

korkybuchek · 2024-10-29T17:49:39 1730224179

Yeah, the CFAA doesn't work by analogy unfortunately.

erohead · 2024-10-29T21:03:52 1730235832

CFAA has recently (2021) been limited by Van Buren ruling.

CPLX · 2024-10-30T08:54:18 1730278458

The entire US legal system works by analogy.

rozap · 2024-10-31T18:22:37 1730398957

You're right in principle, but I think in practice this is sort of a non issue. Most sites now employ (for better or worse) anti botting tools which have some sort of javascript challenge that will generate a unique token. Given that this tool is only capable of replacing the dynamic parts of the request graph with tokens found in the output from the previous steps, I don't see how it would get around these sorts of challenges. So effectively, if you're using methods to prevent "unauthorized" use of your APIs, I think this sort of tool will be defeated extremely easily. The reverse engineering/web scraping world has unfortunately evolved to be extremely adversarial, and this sort of tool is does not have the sneakiness required to get around even the simplest anti botting measures.

Until LLMs become smart enough to emulate a full JS stack, I think we're safe :)

_hl_ · 2024-10-29T14:19:41 1730211581

This is awesome, but I'm not sure what the long-term use case for the intersection of low-latency integration and non-production-stable is? I'm saying this as someone with way more experience than I'd like to in using reverse-engineered APIs as part of production products... You inevitably run into breakages, sometimes even actively hostile platforms, which will degrade user experience as users wait for your 1day window to fix their product again.

Though I suppose if you can auto-fix and retry issues within ~1minute or so it could work?

lo0dot0 · 2024-10-29T16:30:38 1730219438

New pipe breaks regularly. It's almost like YouTube changes the API on purpose to hurt 3rd party clients that don't show ads.

miki123211 · 2024-10-29T21:45:38 1730238338

Either that, or they just straight up don't care.

I think it's pretty likely that they just don't look at or test Newpipe when they change their APIs. If the change doesn't break any official clients, it goes through.

With how large Youtube is, I iimagine API changes are not infrequent.

SunlitCat · 2024-10-30T06:02:45 1730268165

Well, then a service like Integuru would be perfect for Newpipe! Maybe someone should suggest them to use this awesome service? (I am pretty sure Alphabet would be really happy about that one! :D)

alanloo · 2024-10-29T14:27:38 1730212058

This is a very important question. Thank you for bringing this up! Currently it requires human intervention to auto-fix integrations as someone needs to trigger the correct network request. We are planning on having another agent that triggers the network requests through interacting with the UI and then passing the network request to Integuru.

loktarogar · 2024-10-30T00:59:15 1730249955

In my experience reverse engineering is often the easy bit, or at least easy compared to what follows: maintenance. Knowing both when and how it fails when it fails (eg in cases like when the API stops returning any results but is still otherwise valid). Knowing when the response has changed in a way that is subtle to detect, like they changed the format of a single field, which may still parse correctly but is now interpreted incorrectly.

How do you keep up with the maintenance?

alanloo · 2024-10-30T01:37:13 1730252233

We feel your pain with maintenance. We have plans to handle this by using LLMs to detect response anomalies.

From our experience, reverse engineering is still less prone to breakage compared to traditional browser automation. But we definitely want to make integrations even more reliable with maintenance features.

cphoover · 2024-10-30T03:33:09 1730259189

Wouldn't something like snapshot testing from a scheduled probe be more effective and reliable than using an LLM?

Every X hours test the endpoints and validate the types and field names are consistent... If they change then trigger some kind of alerting mechanism to the user.

alanloo · 2024-10-30T03:47:52 1730260072

if the types and field names change, our parsing script should be able to detect that so it should be covered. I was talking about handling the subtle changes that are undetectable by checking field types and names

sureglymop · 2024-10-30T06:17:08 1730269028

I word say: it depends. I must've wasted days of my life trying to reverse engineer android apps with pinned certificates. It's crazy how hard it has become to just inspect the traffic on my own device that I bought and own.

sunbum · 2024-10-30T09:20:10 1730280010

Just setup httptoolkit [0], it just works.

[0] - https://httptoolkit.com/

franga2000 · 2024-10-30T11:30:57 1730287857

I'm gussing you haven't done this a lot? You can't easily add a cert to the system store without rooting, but then you need to bypass root detection. If the app uses cert pinning, you either need to hook it (also detectable) or patch it (error-prone and again, detectable). If the app is Flutter, you'll need to do some binary patching too.

buildfocus · 2024-10-30T12:46:40 1730292400

If you have root, HTTP Toolkit will handle most of that for you - it can detect root via ADB, install systems certs automatically, and install Frida & intercept individual app targets with most cert pinning disabled (frida scripts it uses are here: https://github.com/httptoolkit/frida-interception-and-unpinn...).

No manual setup or config, just click a button and done.

Avoiding in-depth detection is left as an exercise for the reader, although there are a small set of existing countermeasures in there. In practice, there is definitely a very long tail of further cases of increasing complexity, with diminishing returns on automated solutions, but it turns out in practice you can automate quite a long way down that path and cover most normal cases.

Flutter is the one awkward case here I've found that doesn't fully work. Very interested to see if there are generalizable automated solutions there, or if the recent fork announcements mean the slow death of flutter anyway...

loktarogar · 2024-10-30T07:54:04 1730274844

Yeah I feel you on that. I wonder if this can deal with those difficult cases? This would be killer if so

toomuchtodo · 2024-10-29T14:13:59 1730211239

Brilliant. Is the next part to monitor and autocorrect breakage when the API in scope changes unexpectedly underneath the system? This is a pain point of workflow automation systems that integrate with APIs in my experience, typically requiring a human to triage an alert (due to an unexpected external API change), pause worker queues, ship a fix, and then resume queue processing.

Love the landing page, please keep it.

alanloo · 2024-10-29T14:20:37 1730211637

Thanks and yes that's part of the roadmap!

Currently you need to trigger the UI actions manually to generate the network requests used by Integuru. But we're planning automate the whole thing by having another agent auto-trigger the UI actions to generate the network requests first, and then have Integuru reverse-engineer the requests.

mdaniel · 2024-10-29T16:13:36 1730218416

Ah, by clicking on the Taiki logo to see what the ... parent company? ... builds, I now understand how this came about. And I'll be honest, as someone who hates all that tax paperwork gathering with all my heart, this launch may have gotten you a new customer for Taiki :-)

Also, just as a friendly suggestion, given what both(?) products seemingly do, this section could use some love other than "we use TLS": https://www.taiki.ai/faq#:~:text=How%20does%20Taiki%20handle... since TLS doesn't care about storing credentials in plain text in a DB, for example

---

p.s. the GitHub organization in your .gitmodules is still pointing to Unofficial-APIs which I actually think you should have kept o/

alanloo · 2024-10-29T16:23:41 1730219021

Thank you for your suggestions, and really glad to hear you're excited about Taiki! We will update the the FAQ with your suggestions — honestly, this part of the website is a bit outdated, and we will make sure to change it.

Regarding the Unofficial-APIs name, it was a really tough decision. We liked the name a lot but just thought it was a bit long. A Real pleasant surprise that you found it :)

imranq · 2024-10-29T16:08:37 1730218117

Wow this is great! I think this is kind of the future of automation and "computer use" once LLMs become powerful enough.

Every task on the web can be reduced down to a series of backend calls, and the key is extracting out the minimal graph that can replicate that task.

richardzhang · 2024-10-29T16:09:46 1730218186

Thank you!

blakeburch · 2024-10-29T15:42:20 1730216540

Really digging this idea.

I've spent plenty of time trying to dig into the network tab to automate requests to a website without an API. Cool to see the process streamlined with LLMs. Wishing you all the best of luck!

richardzhang · 2024-10-29T15:46:33 1730216793

Thank you!

jerrygenser · 2024-10-29T14:35:22 1730212522

Will this work for SSR applications? e.g. think old school net or jsp apps which make network requests then receive HTML which then needs to be parsed in order to understand the key pieces of information and then additional network requests?

I've found it relatively straight forward to reverse engineer SPA requests however with server side rendered apps, yow would your service handle that?

TalvinRamnah · 2024-11-04T12:47:47 1730724467

Same question from me. I've got this exact use case I've been struggling with the past few days.

I work at a milk delivery company in the UK (The Modern Milkman). There's this website called findmeamilkman.com and I wanted to scrape all the milk delivery services that serve every UK postcodes to create polygons I can overlay on a map to identify competitors in each region.

I keep getting rate limited by the servers, and there doesn't seem to be any fetch/XHR requests on the network. Instead a SSR request that returns the full HTML.

If you're product could help me solve this by reverse engineering an API that would be amazing

alanloo · 2024-10-29T14:53:19 1730213599

Good question. Finding the request that's responsible for the action you want will be a bit trickier for SSR, but it's still possible for most cases. It auto-generates regex (for now) to parse out needed info out of the html template.

jerrygenser · 2024-10-29T15:24:33 1730215473

Another thing I've seen is that some of these old school apps are sending certain requests that don't modify the page but set server side context which subsequent requests are dependent on.

For example, set context to a particular group and then subsequent navigation depends is filtered on that group even though the filter is not explicit on the client side but due to state stored in the session remotely.

This can also have implications on concurrency for a given session where you need to either create separate sessions or make sure there is some lock on particular parts of server side state.

Would this type of this eventually be possible? Or at least hooks in able for us to add custom code such as session locks

alanloo · 2024-10-29T17:10:01 1730221801

Very interesting to hear about your experience here! We haven't come across a website that has this design and don't offer support for this just yet. We can certainly implement if more people face a similar situation.

toomuchtodo · 2024-10-29T14:46:50 1730213210

Would be cool to use a proxy to MITM to twiddle the bits (with its own API) if the use case isn't supported by a browser or robotic process automation driving the app's client side UX.

jerrygenser · 2024-10-29T14:51:17 1730213477

I was talking about web apps. But yeah, for old school desktop apps or windows native proxy MITM works

aleksiy123 · 2024-10-30T05:52:13 1730267533

At one time I experimented with reverse engineering API from HAR files to automatically generate open API spec then client.

It actually worked decently well but of course there where edge cases and i was thinking of giving it a shot more recently to fill the gaps with LLMs.

Pretty cool to see someone take this idea much further and make it a service and I feel like theres even more potential here, and not just for integrations.

richardzhang · 2024-10-30T05:53:25 1730267605

Makes us really happy to hear this!

shubb · 2024-10-30T00:37:56 1730248676

There are a lot of companies using old custom or self hosted webapps that they control but can't change - maybe the 3rd party that built it kept the code, maybe its an orphan product, maybe the silo that owns it won't build an api.

Anyway a lot of good points here about legalities she shifting APIs, but I think there are plenty of situations where this is great and none of that applies.

smashah · 2024-10-30T04:23:45 1730262225

Very cool! If Megacorps insist on pulling the Web 2.0 promises of APIs from under us then we will build them ourselves in the spirit of adversarial interoperability.

It's time for there to be a legal protection framework for OSS maintainers to stop being bullied with legal threats from Megacorps.

383toast · 2024-10-30T04:24:54 1730262294

What's the stance on security for handling private tokens/cookies/sessions/etc?

mormegil · 2024-10-31T11:08:12 1730372892

My first thought. Do I understand correctly that the HAR with all my session cookies, username&passwords&etc (not mentioning possibly sensitive data in the service) is sent to OpenAPI? Well… just… be aware of it if you want to try this.

richardzhang · 2024-10-30T04:30:26 1730262626

This is certainly an important question. We use a third-party vault to store tokens/keys.

rumpelstilzchen · 2024-10-29T14:57:59 1730213879

Nice work, congrats! How do you deal with security related stuff like recaptcha, signed requests and so on?

Do you also support internal APIs of mobile applications? If so, how do you deal with AppCheck / PlayIntegrity / Android Key Attestation / Apple App Attest?

alanloo · 2024-10-29T15:22:13 1730215333

Thank you! Integuru itself doesn't handle recaptchas and signed requests, but we have a hosted solution where we use third-party services to handle recaptchas and manually create integrations for handling signed requests.

We do not directly support APIs for mobile applications; however, if you use MITM software and get all the network requests into a .har file, Integuru should work as expected. We do not handle AppCheck ATM at the moment unfortunately.

skilbjo · 2024-10-30T05:45:42 1730267142

take a look at https://xhr.dev/, a product I built to avoid bot detection challenges in the first place.

ilrwbwrkhv · 2024-10-30T02:19:01 1730254741

Again, it is one of these nice interesting products which should be open source but you shouldn't have taken the VC funding. That immediately is a red flag and this product is guaranteed in the future to go south.

minyakonga · 2024-11-06T02:53:49 1730861629

太棒了，以前的爬虫要么浏览器自动化，要么逆向工程，各有各的优缺点。这里网站五花八门，有很多认证机制，不好处理，Integuru如何后续处理这种问题？

netdevnet · 2024-10-30T10:56:46 1730285806

What's the legality of this? My understanding is that if a site does not offer a public API is because they don't want to (think of WhatsApp). Building a company on that seems extremely risky

nkotov · 2024-10-29T14:56:59 1730213819

This is really awesome. There's several platforms that intentionally gate keep their API and it makes really annoying to build integrations with them. How do you go about these platforms and not breaking their TOS?

richardzhang · 2024-10-29T15:19:25 1730215165

Thank you! There are definitely platforms that intentionally gate-keep their APIs. A good example is LinkedIn, which many companies still try to force-build their own integrations with. Our goal is to allow each developer to make their own informed decision regarding the policies of the platforms that they're working with. For our hosted service, we want to prioritize use cases where the end-user truly owns the data. We can also refer to legal precedent cases where many other companies make unofficial APIs.

compootr · 2024-10-29T15:15:14 1730214914

I don't think it really matters to them. As a provider giving access to these platforms, they're not the user (and they didn't agree to the terms). the end user did, so it's on them to decide whether they risk getting terminated or whatnot

DougWebb · 2024-10-29T16:35:15 1730219715

If they have deeper pockets than the user, they're the ones who will get sued for abuse they enable.

Prosammer · 2024-10-29T13:59:41 1730210381

Very cool, congratulations! Would this work for graphql APIs with introspection disabled?

alanloo · 2024-10-29T14:12:45 1730211165

Thank you! As long as the network request contains the query, it should work as expected. So yes it should work with introspection disabled graphQL APIs. Excited to see what you do with it!

abuhasho · 2024-10-29T23:37:58 1730245078

The best ideas are ones that start off with an internal painpoint. Congrats!

richardzhang · 2024-10-29T23:40:37 1730245237

Thank you!

kevo1ution · 2024-10-29T17:54:54 1730224494

going from tax api reverse engineering to making it easier to reverse engineer any API is smart pivot

shreezus · 2024-10-31T06:26:21 1730355981

Just wanted to say this is super neat! Looking forward to playing with it.

richardzhang · 2024-10-31T17:33:15 1730395995

Thank you!

bstanfield15 · 2024-10-29T15:55:52 1730217352

Hell yeah! Love to see this launch. We have spent a lot of time at Wren recently trying to reverse engineer some local law APIs to help make renewable energy developer lives easier (less parsing through hundreds of PDFs, dead links, etc.) -- going to try this out and see if it can speed up our workflow.

richardzhang · 2024-10-29T16:03:08 1730217788

Thank you! Would love your feedback after you use it!

andrewski77 · 2024-10-29T17:26:15 1730222775

congratulations! this is such a cool idea

richardzhang · 2024-10-29T17:26:56 1730222816

Thank you!