I still think you're missing the point. The idea is that you should use vision A...

suchintan · on March 15, 2024

This is a great point. This is something already on our roadmap. We call it "prompt caching", but I realize writing this that it's a terrible name. Will update! (https://github.com/Skyvern-AI/Skyvern?tab=readme-ov-file#fea...)

Thank you for this feedback

pmontra · on March 15, 2024

The AI would be a compiler that generates the traditional scraper / integration test.

It would save all that long time spent going manually thought every page and figuring out which mistake we did, when that input string doesn't go into that input field or the button on the modal window is not clicked.

Change the UI? Recompile with the AI.

bravura · on March 15, 2024

I didn’t check the code but there would be a few good ways to specify what you want:

* browser extension that lets you record a few actions * describing what you want to do with text * a url with one or two lines of desired JSON to extract

epr · on March 15, 2024

> We call it "prompt caching"

No, that's something completely different than what bravura is talking about, which is why he made a comment to say explicitly that he still thinks you're missing the point.

From your roadmap:

> Prompt Caching - Introduce a caching layer to the LLM calls to dramatically reduce the cost of running Skyvern (memorize past actions and repeat them!)

Adding a caching layer is not what they're asking for. They want to periodically use Skyvern to generate automation code, which they could then deploy themselves in their testing/CI setup. Eventually their target website may make breaking UI changes, then you use Skyvern to generate new automation code. Rinse and repeat. This has nothing to do with an internal caching layer within your service.

suchintan · on March 15, 2024

We've discussed generating automation code internally a bunch, and what we decided on is to do action generation and memorization, instead of code generation and memorization. They're not that far apart conceptually, but there is one important distinction: The generated output would just be a list of actions and their associated data source.

For example, if Skyvern was asked to log-in to a website and do a search for product X, the generated action plan would include: 1. Click the log in button 2. Click "sign in with email" 3. Input the email address retrieved from source X 4. Input the password retrieved from source Y 5. Click log in 6. Click on the search bar 7. Input the search term from source Z 8. Click Search

Now, if the layout changed and suddenly the log-in button had a different XPath, you have two options: 1. Re-generate the entire action plan (or sub-action plan) 2. Re-generate the specific component that broke and assume everything else in the action plan still works