Curious - have you compared Gemini against Anthropic and OpenAI’s offerings here? Am needing to do something similar for a one-off task and simply need to choose a model to use.
Gemini is an awful developer experience but accuracy for OCR tasks is close to perfect. The pricing is also basically unbeatable - works out to 1k 10k pages per dollar depending on the model. OpenAI has subtle hallucinations and I haven’t profiled Anthropic.
If I may ask which model are you using? I have tried OCR'ing my bank statements in AI studio and the results have been less than optimal. Specifically it has a tendency to ignore certain instructions combined with screwing up the order.
Some pointers on what worked for you would be greatly appreciated.