Or the other direction. Tesseract can give you confidence levels for the guesses... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		adrianmonk on Aug 9, 2024 \| parent \| context \| favorite \| on: Show HN: LLM-aided OCR – Correcting Tesseract OCR ... Or the other direction. Tesseract can give you confidence levels for the guesses it makes about a symbol (see https://tesseract-ocr.github.io/tessdoc/APIExample.html). You can give the LLM this information and have it tell you which is right. Here's a demo. === input to ChatGPT 4o === I'm trying to read some text but one of the characters is unclear. I will use the notation [xy] to indicate that I think the character is either x or y. Please tell me which one you think is correct. Here is the text: "There was a lot of traffic on [I1]-80 this morning." In your response, please include a line like "correct character: x" so I can see the conclusion at a glance. === ChatGPT's response === The correct character in the text is likely "I". Interstate highways in the United States are often denoted by "I-" followed by the highway number. "I-80" is a major interstate highway that runs across the United States, so it makes more sense in this context than "1-80," which would imply a different and less common road designation. correct character: I

foota on Aug 9, 2024 | [–]

Ah, interesting. I guess you could probably get bounding boxes etc.,. too, so you could pass the image as well to a multimodal model (although maybe it'll just run tesseract behind the scenes lol)

eigenvalue on Aug 9, 2024 | [–]

That's a great idea, I should add that to my project. Will investigate.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact