Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Or the other direction. Tesseract can give you confidence levels for the guesses it makes about a symbol (see https://tesseract-ocr.github.io/tessdoc/APIExample.html).

You can give the LLM this information and have it tell you which is right. Here's a demo.

=== input to ChatGPT 4o ===

I'm trying to read some text but one of the characters is unclear. I will use the notation [xy] to indicate that I think the character is either x or y. Please tell me which one you think is correct. Here is the text: "There was a lot of traffic on [I1]-80 this morning." In your response, please include a line like "correct character: x" so I can see the conclusion at a glance.

=== ChatGPT's response ===

The correct character in the text is likely "I".

Interstate highways in the United States are often denoted by "I-" followed by the highway number. "I-80" is a major interstate highway that runs across the United States, so it makes more sense in this context than "1-80," which would imply a different and less common road designation.

correct character: I



Ah, interesting. I guess you could probably get bounding boxes etc.,. too, so you could pass the image as well to a multimodal model (although maybe it'll just run tesseract behind the scenes lol)


That's a great idea, I should add that to my project. Will investigate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: