Hacker News new | past | comments | ask | show | jobs | submit login

I believe such an API already exists: https://pdftables.com/ (no affiliation).

Went to a presentation at a Golang meetup in Amsterdam by the guys behind this company. Seemed to know their stuff. But I have no real world experience using it.




I had a look - from their FAQ [0]:

However, some PDFs are scanned documents, or only contain images. PDFTables doesn't perform Optical Character Recognition (OCR) to turn these images into text.

To process these kinds of documents, you will need to either enable OCR in your scanning software, or run the PDF through specialist OCR software before using PDFTables.

[0] https://pdftables.com/faq




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: