home.social

#doctr — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #doctr, aggregated by home.social.

  1. @alerque

    In the first stage, I'm using #PaddleOCR

    github.com/PaddlePaddle/Paddle

    Their doc says they support Windows, macOS and Linux. For simplicity, I wrapped the python dependency into podman/docker, so it's Linux-only for now. If there are potential users other than me, I guess it won't be too hard to make it cross platform.

    github.com/Endle/beanbeaver-ocr

    Before PaddleOCR, I first tried #docTR

    github.com/mindee/doctr

    Some Reddit posts claimed that docTR was the best. It was pretty well for English (Latin characters), but it doesn't support Chinese. It would try to recognize a Chinese character as a combination of Latin characters with a relatively high confidence.

    PaddleOCR supports Chinese recognize, but I turned it to English-only mode. For the T&T receipt I showed, PaddleOCR provides a very low confidence to Chinese words (github.com/Endle/beanbeaver/bl), so beanbeaver can parse this bilingual receipt by the English parts