Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tried extracting data from a newspaper. It is really hard. What is a headline and which headline belongs to which paragraphs? Harder than you think! And chucking it as is into OpenAI was no good at all. Manually dealing with coordinates from OCR was better but not perfect.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: