Viewing a single comment thread. View all comments

Rodny_ OP t1_iwwd1ta wrote

Yea on one hand it seems like problem that is quite easy to solve but the more you dig the more problems and obstructions you find. And than it makes you wondering why is such a basic task so hard to solve with some easy to use tools but textToImage models does get so much attention witch such an accessible tools.

2

visarga t1_iwwdkai wrote

Because it's a lucrative AI API for all the big players. Selling OCR for documents.

3

AtomKanister t1_iwwx4hs wrote

Might also be the data. The open-source internet is full of images with related text that can be crawled, but you won't find a lot of document scans with annotated boxes out there.

However, it's definitely doable. The paid services from cloud providers are all very, very high quality. It's more likely an open source availability issue.

1