Hi, I was wondering how good modern open-source OCR models are. Are they capable of reading text with different fonts on various backgrounds with decent success? What success rate I might expect? I am primarily interested in numbers recognition could you recommend me some good models for that? If you do not get good results out of the box do the models allow you to do some fine tuning? And lastly what latency can I expect from it if there are about 5-10 numbers on one image that I want to read? I was looking on the web for such info but all I found were articles comparing the models between each other rather than specifying the state and capabilities of these models. Thanks, everyone for the information.

Comments

[deleted] t1_iwtx39w wrote on November 18, 2022 at 9:07 AM

#564,319

[removed]

mikeful t1_iwui31v wrote on November 18, 2022 at 1:24 PM

#565,762

I've used EasyOCR for number recognition tasks. Works fairly well. https://github.com/JaidedAI/EasyOCR

Tried to speed up recognition by running task tuned segmentation model and cropping input image on good detection but EasyOCR seemed to work better without it.

flapflip9 t1_iwumecm wrote on November 18, 2022 at 2:00 PM

#566,126

Look into open-mmlab's MMOCR, does both detection and recognition, with English and Chinese alphabet support. Absolutely wicked performance, it scrapes off text from logos, flyers, blurred text, etc. Not suitable for real-time performance.

Until a few years ago, I was quite happy with Tesseract, but they've fallen behind since then. Still good for scanning printed text or similar. Also supports a lot of languages.

maxisawesome538 t1_iwusw22 wrote on November 18, 2022 at 2:50 PM

#566,709

tesseract is smth we've used before

robertknight2 t1_iwveveb wrote on November 18, 2022 at 5:20 PM

#568,655

Replying to flapflip9 (#566,126)

To add to this, Tesseract's text recognition of identified lines of text uses a modern approach involving LSTM neural networks, but the text detection process which comes before this uses classical/heuristic (ie. non-ML) approaches which work well on clean-ish document images, but can struggle with photos of documents that have uneven lighting conditions and spotting text in a photo (eg. numberplates in a city scene).

I maintain a JavaScript build of Tesseract with an online demo that you can try with different images: https://robertknight.github.io/tesseract-wasm/

Jean-Porte t1_iwwbgr7 wrote on November 18, 2022 at 9:03 PM

#571,438

I wish this problem was addressed by big players more. OCR on handwritten text is challenging but very useful

Rodny_ OP t1_iwwd1ta wrote on November 18, 2022 at 9:14 PM

#571,548

Replying to Jean-Porte (#571,438)

Yea on one hand it seems like problem that is quite easy to solve but the more you dig the more problems and obstructions you find. And than it makes you wondering why is such a basic task so hard to solve with some easy to use tools but textToImage models does get so much attention witch such an accessible tools.

visarga t1_iwwdkai wrote on November 18, 2022 at 9:17 PM

#571,580

Replying to Rodny_ (#571,548)

Because it's a lucrative AI API for all the big players. Selling OCR for documents.

labloke11 t1_iwwmt3m wrote on November 18, 2022 at 10:23 PM

#572,271

I have been using different ocr to process forms and I came to a conclusion, they all kind of sucks.

AtomKanister t1_iwwx4hs wrote on November 18, 2022 at 11:40 PM

#573,020