Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The layers of extreme complexity in this situation are astounding.

The app has some text internally, it renders it by rasterizing fonts to a bitmap, the OS takes the bitmap and composites it within the wider UI. Google Assistant grabs a screenshot of the fully composited, post-processed OS UI, sends it over the internet to a server, which uses an OCR model to read all the text, and a different model to work out which is the relevant text, which is then sent back over the internet and displayed on the device.

All so the user can copy some text.



I used to be upset at such absurdities. Now I try to appreciate what a relatively universal interface images are instead! :D It doesn’t matter how text gets drawn if we OCR it. (Though I feel better when we do so on device.)


On Android it more or less just uses the accessibility APIs to grab the actual text, you can do it without using Google Assistant even by selecting text inside an app's thumbnail window from the Recent Apps screen.


It doesn't. You can try running it when your keyboard is open and see what an absolute trainwreck the OCR makes of that.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: