Not sure how you would do that without having the ground truth to compare to. It...

esafak · on Aug 9, 2024

You can correct the transcript to create the ground truth. Or print your own document, then run OCR on it.

OCR evaluation has been a thing for decades.

edit: Better than a single document, process a standard OCR dataset: https://paperswithcode.com/task/optical-character-recognitio...

47282847 · on Aug 9, 2024

Standard datasets can no longer be used for benchmarking against LLMs since they have already been fed into it and are thus too well-known to compare to lesser known documents.

eigenvalue · on Aug 9, 2024

Oh you meant for just a single benchmarked document. I thought you meant to report that for every document you process. I wouldn't want to mislead people by giving stats on a particular kind of scan/document, because it likely wouldn't carry over in general.