home.social

#altoxml — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #altoxml, aggregated by home.social.

  1. @tkinias as far as I understand you want to implement a PDF -> Text -> PDF workflow. Using plaintext as intermediate is problematic, as you (may) lose a lot of layout information.

    For high quality fulltext you may need a more sophisticated intermediate format like #PageXML or #AltoXML. But they also require a more sophisticated tool for editing like #OCR4All.