Thanks for this. I tried using Tesseract over the weekend to extract text from a game screenshot and had no luck. The documentation for Tesseract is rather opaque; maybe I'll have better luck with Ocropus.
I wouldn't say that Ocropus is well-documented (this blog post was partially intended to address that). But it's at least written in easily hackable Python, whereas Tesseract is 30 year old C/C++.
My main gripe with tesseract is how convoluted and lacking in documentation the training procedure is, which is critical to getting better results. I'll be sure to check out ocropus.
I wonder if it's possible to remove the need for post-processing of the LSTM's output by integrating transcription into the neural network model directly.
The first row of output from the Neural Net is a special "no character" output which effectively gives you the character segmentation. You can distinguish "aa" from "a" because the former shows up as "(no)a(no)a(no)" whereas the latter is "(no)a(no)". You can read more about this in the Ocropus paper: http://www.helsinki.fi/~mpsilfve/ocr_course/materials/2008-b...
If you're able to provide enough training data, there's no reason Ocropus couldn't do this. If you're open to using a commercial OCR program, FineReader is excellent.