This and many other applications of this sort will depend on AIs becoming ubiquitous, cheap and not metered. Metered access (like most current SAS AIs) will deter these sorts of heavy use cases. Running locally will be best, both for pricing and so you can have it build up context over time.
As an experiment, because p47's social media posts move the markets $60 in a day, and the last thing on Earth I want to be doing is reading them so the system makes an API request for any new ones, then checks for links, video, and images. It uses OpenAI whisper running with transformers.js on the local machine using webgpu for the inference to transcribe the video and audio and image to text for ocr. I tried to do the text generation locally but any decent model although will run caused my Macbook M3 to get so hot I could cook a steak on it while freezing the rendering for the whole computer.
image-to-text and video, audio-to-text works fine, there are lot of uses for text generation that work but to get high quality analysis to see if a social media post might cause the stock market to crash requires sending the data out to an api. If the side panel requires searching for links to navigate to it requires a third party api.
Working with it, I think the next hardware race will be getting these models to run on personal computers in next 2 - 5 years and I have a suspicion Microsoft is ahead of Apple.