I'm very excited about work being done in this area. In fact, I'm working on a product that does exactly this (runs in the background, local LLM, has access to screen-space entities, can take certain actions). It feels pretty magical to use (here it's running on my 3090Ti; much slower but still serviceable on my M1 MBP): https://www.youtube.com/watch?v=JH1noETdQEA
Currently using Mistral-7B-Instruct-v0.2, but working on a fine-tuning dataset which should make it work better with local applications interfaces (console, browser, email client, Slack, Discord, Word, Excel, etc.).
Still a lot of work to be done before that point, but I'll write at least 1000 prompts (some synthetic, some hand-crafted) based on the "arrow notation" I posted here[1]. The fine-tuning itself is actually quite easy[2] as long as the data is properly formatted. After, I'll have to quantize the model (which I've never done before, but doesn't seem too hard).
I've been working on this for the past month, and in its current state it runs some actions, but it's very context-dependent.
For example, if I'm on a GitHub page and I say "clone this git repo," it works pretty flawlessly (git clones the repo in a scratch directory you set up in the settings), but if I'm reading a blog which references a git repo, it sometimes gets confused (may try to "clone" the blog URL for example), so I'm working through a few solutions, while trying to avoid multi-shot prompting.
A lot of this involves pretty par for the course data cleaning. For example, you'd turn the user query into an embedding which you then compare to a few "contexts" (git workflows, research, creative work) to see which one matches it best, and then prune the raw screen data per that context, removing (or de-emphasizing) non-context-relevant information. So, in the above case, if I'm trying to clone a git repo, I don't care about non-git URLs (we can mask them, remove them, or whatever). Then, we feed the sanitized context to the LLM. Et Voilà !
Can someone ELI5 what is meant by reference resolution? Sounds like it means identifying entities, given that it talks about “on screen and in the background”.
The main provider of map data is TomTom, but data is also supplied by Automotive Navigation Data, Getchee, Hexagon AB, IGN, Increment P, Intermap Technologies, LeadDog, MDA Information Systems, OpenStreetMap, and Waze. Apple renewed their agreement with TomTom in 2015, though later decided to gradually switch to OpenStreetMap and remove all of TomTom-contributed map data except for live traffic information.
Yeah, it still sucks for me in the Netherlands - a country with a massive iOS population.
Since we can’t change default map apps, I always end up having to use it with calendar, reminders, siri, watch, etc. Always disappointed by how much info it lacks.
Currently using Mistral-7B-Instruct-v0.2, but working on a fine-tuning dataset which should make it work better with local applications interfaces (console, browser, email client, Slack, Discord, Word, Excel, etc.).