> It does seem possible that this could be replaced with a local model in the near future. It's not clear the average user has the hardware specs for this to be an option today, but it will increasingly be plausible.
Siri does something like this when reading messages into your AirPods. It will give brief descriptions of photos sent in the message. I'm pretty sure it's all run locally.
Siri does something like this when reading messages into your AirPods. It will give brief descriptions of photos sent in the message. I'm pretty sure it's all run locally.