home.social

#wtfpdf — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #wtfpdf, aggregated by home.social.

  1. #archivtagAT #archivtag2025 Andreas Rauber zeigt ein Beispiel von einem PDF, auch HTML hat oder als Virtual Machine gespeichert werden kann, die dann erweiterte Funktionen haben. Was passiert, wenn man so ein PDF normalisiert oder migriert wird? #wtfPDF

  2. ICYMI - are "octal escape sequences" in #PDF strings really a preservation risk, as claimed by the authors of the recent "The Phantom 👻 of a PDF File" blog post?

    Some quick tests I did with eight different PDF processing tools suggest they're not, and #JHOVE's inability to handle them really seems to be the exception here #wtfPDF #fileformatfriday

    bitsgalore.org/2024/11/14/esca

  3. The authors of the recent "The Phantom of a PDF File" blog post argue that "octal escape sequences" in #PDF strings are a potential preservation risk.

    But some quick tests with 8 different PDF tools suggest that #JHOVE is really the only tool that can't handle them!

    Details in my new blog post "Escape from the phantom of the PDF" #wtfPDF 👻 :

    bitsgalore.org/2024/11/14/esca

  4. Update on the "Phantom of the #PDF" blog of a few weeks ago (link: digitalpreservation.fi/en/2024).

    I did a little test of authors' claim that "#JHOVE probably is not the only software that will get confused" by octal escape sequences* in metadata strings

    So I read the file with 8 different PDF tools/libraries:

    github.com/openpreserve/jhove/

    Turns out JHOVE actually *is* the only software that gets confused by this #wtfPDF!

    *) The authors describe this as "dual encodings", but see Peter Wyatt's comment!

  5. Here's a sneak peek at a #PDF Quality Assessment tool I'm working on for digitisation batches , mostly based on #PyMuPDF, #pillow and #Schematron:

    github.com/KBNLresearch/pdfqua

    (Wouldn't recommend this for production yet, as it's not completely finished, and I'm still changing some things around.)

    #wtfPDF

  6. I explored to what extent #VeraPDF and #JHOVE can be used to identify #PDF features that are potential preservation risks. Check out this (massive!) blog post for the full lowdown #wtfPDF:

    bitsgalore.org/2023/05/25/iden