Rodny_ OP t1_iwwd1ta wrote on November 18, 2022 at 9:14 PM

Reply to comment by Jean-Porte in [P]Modern open-source OCR capabilities and which model to choose by Rodny_

Yea on one hand it seems like problem that is quite easy to solve but the more you dig the more problems and obstructions you find. And than it makes you wondering why is such a basic task so hard to solve with some easy to use tools but textToImage models does get so much attention witch such an accessible tools.

visarga t1_iwwdkai wrote on November 18, 2022 at 9:17 PM

Because it's a lucrative AI API for all the big players. Selling OCR for documents.

AtomKanister t1_iwwx4hs wrote on November 18, 2022 at 11:40 PM

Might also be the data. The open-source internet is full of images with related text that can be crawled, but you won't find a lot of document scans with annotated boxes out there.

However, it's definitely doable. The paid services from cloud providers are all very, very high quality. It's more likely an open source availability issue.