home.social

#jhove — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #jhove, aggregated by home.social.

  1. @tallison

    Thank you Tim, that's so nice to hear!!

    I'd love to do so!!! Right now it's just a small attempt, to see whether that could help prioritizing our work towards more common errors (and maybe improve #JHOVE's behaviour?) but I'd love to do it at a greater scale.

    If I manage to do anything interesting in that direction, I'll certainly communicate at #iPRES next year!

  2. Alors clairement Noël n'est pas propice à la publication de billets sur la préservation numérique, mais tant pis, je publie quand même.

    digipres.fr/archives/90

    Un cas supplémentaire de fichier légitime considéré comme invalide par #JHOVE. Bref, la validation, c'est un moyen intéressant de comprendre des choses sur la structure des données, mais c'est pas à prendre au pied de la lettre.

    Crédits pour le meme en couverture : @mickylindlar

    #DigiPres_FR #TIFF #EXIF #exiftool #TIF-HUL-66

  3. ICYMI, I ran some experiments to see if #VeraPDF’s parse status can be used to predict #PDF rendering problems, using an existing dataset of synthetic PDFs as ground truth. I also looked at how this compares against the occurrence of #JHOVE validation errors.

    Details in this blog post:

    bitsgalore.org/2023/06/29/vera

  4. Out of curiosity I ran both #JHOVE and #VeraPDF on the "Synthetic #PDF Testset for File Format Validation" by @mickylindlar et al. (link: radar-service.eu/radar/en/data).

    Then did a quick comparison between validation errors as reported by JHOVE, and parse errors and logged warnings by VeraPDF.

    Main result so far is that majority of PDFs for which JHOVE reports validation errors, also result in either parser error or warning in VeraPDF. Sneak peek here:

    github.com/KBNLresearch/pdf-ch

  5. I explored to what extent #VeraPDF and #JHOVE can be used to identify #PDF features that are potential preservation risks. Check out this (massive!) blog post for the full lowdown #wtfPDF:

    bitsgalore.org/2023/05/25/iden

  6. #OPFOAG Thomas Ledoux advocates for a standard #schematron edition tool to enforce institutional policies on #JHOVE, #veraPDF & #jpylyzer outputs.

  7. @mickylindlar
    Carl: "PDF is a huge tree of objects linked one to another." Which makes interpreting errors far from intuitive!

    But #veraPDF, and soon #JHOVE, should be able to associate an error to the problematic zone in the PDF.