home.social

#document_processing — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #document_processing, aggregated by home.social.

Michael Roberts @[email protected] · 2025-11-14 · 00:38 UTC

Hey, Fedi, what's the best way under Linux to OCR a scanned PDF and put the resulting text into the PDF? I haven't found any particularly convincing recipes yet. (I mean, Tesseract for the OCR part, I know that much - but what's the best way to get the text into the PDF for searchability and text selection? Ideally without disturbing any annotations I've already made.)
#pdf #linux #ocr #tesseract #document_processing

#pdf #linux #ocr #tesseract #document_processing
:rss: Hacker News @[email protected] · 2025-11-06 · 18:51 UTC

Benchmarking the Most Reliable Document Parsing API
https://www.tensorlake.ai/blog/benchmarks
#ycombinator #context_engineering #document_processing #machine_learning #LLM #RAG #vector_database #knowledge_graphs #document_parsing #structured_extraction #AI_workflows #Document_Parsing #OCR #Benchmarks #TEDS #Enterprise_AI

#ycombinator #context_engineering #document_processing #machine_learning #llm #rag
:rss: Hacker News @[email protected] · 2025-11-06 · 18:51 UTC

Benchmarking the Most Reliable Document Parsing API
https://www.tensorlake.ai/blog/benchmarks
#ycombinator #context_engineering #document_processing #machine_learning #LLM #RAG #vector_database #knowledge_graphs #document_parsing #structured_extraction #AI_workflows #Document_Parsing #OCR #Benchmarks #TEDS #Enterprise_AI

#ycombinator #context_engineering #document_processing #machine_learning #llm #rag
:rss: Hacker News @[email protected] · 2025-11-06 · 18:51 UTC

Benchmarking the Most Reliable Document Parsing API
https://www.tensorlake.ai/blog/benchmarks
#ycombinator #context_engineering #document_processing #machine_learning #LLM #RAG #vector_database #knowledge_graphs #document_parsing #structured_extraction #AI_workflows #Document_Parsing #OCR #Benchmarks #TEDS #Enterprise_AI

#enterprise_ai #teds #benchmarks #ocr #ai_workflows #structured_extraction
:rss: Hacker News @[email protected] · 2025-11-06 · 18:51 UTC

Benchmarking the Most Reliable Document Parsing API
https://www.tensorlake.ai/blog/benchmarks
#ycombinator #context_engineering #document_processing #machine_learning #LLM #RAG #vector_database #knowledge_graphs #document_parsing #structured_extraction #AI_workflows #Document_Parsing #OCR #Benchmarks #TEDS #Enterprise_AI

#ycombinator #context_engineering #document_processing #machine_learning #llm #rag