Hacker News new | past | comments | ask | show | jobs | submit login
Extracting text from an image using Ocropus (danvk.org)
63 points by danvk on Jan 19, 2015 | hide | past | favorite | 10 comments



Thanks for this. I tried using Tesseract over the weekend to extract text from a game screenshot and had no luck. The documentation for Tesseract is rather opaque; maybe I'll have better luck with Ocropus.


I wouldn't say that Ocropus is well-documented (this blog post was partially intended to address that). But it's at least written in easily hackable Python, whereas Tesseract is 30 year old C/C++.


My main gripe with tesseract is how convoluted and lacking in documentation the training procedure is, which is critical to getting better results. I'll be sure to check out ocropus.


You'll enjoy my follow-up post then, which talks about training: http://www.danvk.org/2015/01/11/training-an-ocropus-ocr-mode...


I wonder if it's possible to remove the need for post-processing of the LSTM's output by integrating transcription into the neural network model directly.


The first row of output from the Neural Net is a special "no character" output which effectively gives you the character segmentation. You can distinguish "aa" from "a" because the former shows up as "(no)a(no)a(no)" whereas the latter is "(no)a(no)". You can read more about this in the Ocropus paper: http://www.helsinki.fi/~mpsilfve/ocr_course/materials/2008-b...


Slightly off-topic, but is anyone aware of an similarly capable library for hand-written text recognition, i.e. ICR?


If you're able to provide enough training data, there's no reason Ocropus couldn't do this. If you're open to using a commercial OCR program, FineReader is excellent.


Are there any OCR/ICR open source projects that are actively being worked on that anyone knows about?


Ocropus is fairly actively developed. Lots of commits in 2015: https://github.com/tmbdev/ocropy/commits/master




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: