+1. At a previous employer we fed images of interest from the web into Google's OCR API to see what we could see. In addition to scene descriptions, the API will transcribe any text it detects.
With all the easy to use tools available to programmers today, it would not be terribly hard to use OCR on a screenshot to find the text of interest and derive the scraping code by searching for the OCR'd text in the markup.
If none of your extant parsers can extract the info you want from the page, send it to OCR pipeline (or, hell, Mechanical Turk) and generate a new one.
Yep yep - if the text isn't distorted I can rip it from an image within minutes using pre-built OCR libraries. If the text is distorted there's full-blown API-driven services for solving CAPTCHAs and the like.
Oh yea - I guess I had a specific use case in mind when I said that =)
What I meant is that I can hammer out some Node/Python that will grab an image w/text and put it through OCR for character extraction. "Programming" it would take me a handful of minutes.
With all the easy to use tools available to programmers today, it would not be terribly hard to use OCR on a screenshot to find the text of interest and derive the scraping code by searching for the OCR'd text in the markup.
If none of your extant parsers can extract the info you want from the page, send it to OCR pipeline (or, hell, Mechanical Turk) and generate a new one.