#ocr4all — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #ocr4all, aggregated by home.social.
-
@daelba @KathyReid I also recommend #OCR4all. While e-Scriptorium runs on the kraken engine, OCR4all, uses Calamari. Once you have training data in e-Scriptorium, you can also potentially use them to train models in OCR4all. Depending on your discipline, the existing models for e-Scriptorium are 'better' than those for OCR4all or vice-versa, but both tools are highly recommended.
-
Every now & then, I give #ChatGPT a scan of my handwriting to test its skills in working with #handwrittentexts. Initially, it responded that it could not process the scans or gave me entirely fictional output, but today it got almost everything right. These results are better than those I achieved with #HWR models in #Tesseract & #OCR4all without additional training. I also asked ChatGPT what it "thought" about my writing & it called it "consistently shaped & large with stylistic strokes."
-
Hi #histodons,
I need your expertise. We want to integrate an #opensource #ocr tool into our #useGalaxy Platform so you can better analyse your texts, etc.
I worked with #tesseract some years ago, and I heard about #ocr4all.
Do you have experience with any of these - or other recommendations?
We are also integrating #tranksribus via API but want another ocr-specific option.
Looking forward to your experiences! -
Re OCR/ATR, interestingly the #OCR4all paper also offers a very good overview of the different steps and workflows. It has a different purpose, but I think it can still be used in a class context.
Reul, Christian et al. 2019. “OCR4all—An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings.” Applied Sciences 9 (22): 4853. https://doi.org/10.3390/app9224853.
-
@tkinias as far as I understand you want to implement a PDF -> Text -> PDF workflow. Using plaintext as intermediate is problematic, as you (may) lose a lot of layout information.
For high quality fulltext you may need a more sophisticated intermediate format like #PageXML or #AltoXML. But they also require a more sophisticated tool for editing like #OCR4All.
-
A colleague just asked me about a good, free OCR software for a historical book they are scanning. I was checking out #OCR4all to see if I could recommend it. First thing on the "Getting started" page: A Linux terminal command to start docker … 😵💫 I’m not criticizing the project, which I think does important work, but it’s a rather peculiar definition of "all" …
-
Salut ici :)
Je suis en train de tester #ocr4all pour faire reconnaître de l’écriture manuscrite. ( #ocr #hwr #htr )
Mais j’arrive à rien.
C’est peut-être à cause des modèles ?! Je n’ai que ceux de base qui sont optimisé pour le vieux français … ça aide pas … 😅Est-ce que quelqu’un a déjà essayé et réussi ??
-
@jomla @stabihh Mittlerweile haben wir auf unserem DSRI (Data Science Research Environment) #ocr4all aufgesetzt und der Workflow insgesamt erscheint uns sehr transparent. Allerdings sind wir bei der #Layouterkennung gleich am ersten Dokument gescheitert. Also... "read the docs"!
-
@jomla @stabihh Workshop habe ich leider verpasst. Bin aber interessiert daran, Menschen mit #OCR4all Expertise als Referent*innen nach #Maastricht einzuladen. Hat jemand aus der Community Interesse? Dann gerne PM.
-
@jomla @stabihh Ich sehe mal wieder keinerlei Antworten auf den Post und hoffe ich frage nicht doppelt: wie waren die Erfahrungen? Ich denke gerade darüber nach, welche #OCR Infrastruktur für mich und meine Fakultät langfristig die beste wäre. Mit #Transkribus arbeite ich gerne, aber #OCR4all hat natürlich #OpenScience Pluspunkte. Allerdings weiß ich noch zu wenig über Anwendungserfahrungen für die #FrüheNeuzeit und freue mich über Austausch.
-
#Day2 of #DH2023 pre-conference workshops. Today I am learning how to use #OCR4All. Hopefully, I can teach and tutor folks at the #UniversityOfOslo later. It could be especially useful for #MedievalManuscripts since we have a couple of projects that require good #OCR #HTR processing!
-
On his blog, Jonathan Green recommends #OCR4all for early printed books: http://researchfragments.blogspot.com/2023/04/ocr4all-is-good.html #ocr #digitalhumanities #dh (via https://archivalia.hypotheses.org/171036)