home.social

#code4lib — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #code4lib, aggregated by home.social.

  1. #code4lib journal has a call out for articles about the use of #staticsite publishing:

    > Contributions from any practitioner working on digital publication are welcome, regardless of the stage or sophistication of the publication’s development. This includes boutique, shoestring publications ... as well as large-scale projects that rely on institutional backing, development teams, practices, & infrastructure.

    lists.clir.org/cgi-bin/wa?A2=i

  2. Today, I released version 0.2.0 of a new tool that processes MARC-21 records. It focuses on efficient processing of records and tabulating data for integration into data science or data engineering applications. Feedback and new ideas are very welcome.

    github.com/deutsche-nationalbi

    #code4lib #marc21 #metadata

  3. Revisiting bsdiff as a tool for digital preservation


    by @beet_keeper

    I introduced bsdiff in a blog in 2014. bsdiff compares the differences between two files, e.g. broken_file_a and corrected_file_b and creates a patch that can be applied to broken_file_a to generate a byte-for-byte match for corrected_file_b.

    On the face of it, in an archive, we probably only care about corrected_file_2 and so why would we care about a technology that patches a broken file?

    In all of the use-cases we can imagine the primary reasons are cost savings and removing redundancy in file storage or transmission of digital information. In one very special case we can record the difference between broken_file_a and corrected_file_b and give users a totally objective method of recreating corrected_file_b from broken_file_a providing 100% verifiable proof of the migration pathway taken between the two files.

    #ac3 #Archives #audio #audiovisual #Audit #authenticity #av #Bash #bsdiff #checksums #Code4Lib #corruption #corruptionIndex #digipres #DigitalArchiving #DigitalForensics #digitalLiteracy #DigitalPreservation #DigitalStorage #diplomatics #FileFormats #flac #glitch #glitchAudio #GlitchArt #integrity #mp3 #PreservationAnalysis #PreservationMetadata #provenance #sensitivityIndex #Storage #wav

  4. We're hosting #Code4lib #BC at Emily Carr on October 16 & 17. If you're interested in the intersection of #libraries #archives and #technology, join us!

    Registration, proposals, bursary and more info: wiki.code4lib.org/BC

    #Vancouver #Code4LibBC #LibTech

  5. Version 1.4 of the Annif automated subject indexing tool has been released! 🚀

    • 3 new corpus formats (JSON, JSONL, CSV) supporting metadata + document IDs
    • Include/exclude vocab concepts for better control
    • annif index now supports short-text formats
    • Faster hyperopt with parallel processing
    • tfidf backend refactored (no gensim!)
    • REST API improvements & Python 3.13 support

    github.com/NatLibFi/Annif/rele

    #Annif #NLP #opensource #subjectindexing #machinelearning #code4lib #libraries

  6. The first alpha version of Skosmos 3 has been published! This release provides a peek into the upcoming next major version of the Skosmos publishing tool for SKOS controlled vocabularies.

    The release features a reimplemented front-end with a fresh layout and improved accessibility, as well as many architectural improvements and modernization of the codebase.

    github.com/NatLibFi/Skosmos/re

    #Skosmos #SKOS #thesaurus #classification #ontology #code4lib #OpenSource

  7. The sensitivity index: Corrupting Y2K


    by @beet_keeper

    In December I asked “What will you bitflip today?” Not long after, Johan’s (@bitsgalore) Digtial Dark Age Crew released its long lost hidden single Y2K — well, I couldn’t resist corrupting it.

    Fixity is an interesting property enabled by digital technologies. Checksums allow us to demonstrate mathematically that a file has not been changed. An often cited definition of fixity is:

    Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed — Bailey (2014)

    It’s very much linked to the concept of integrity. A UNESCO definition of which:

    The state of being whole, uncorrupted and free of unauthorized and undocumented changes.

    Integrity is massively important these days. It gives us the guarantees we need that digital objects we work with aren’t harboring their own sinister secrets in the form of malware and other potentially damaging payloads.

    These values are contingent on bit-level preservation, the field of digital preservation largely assumes this; that we will be able to look after our content without losing information. As feasible as this may be these days, what happens if we lose some information? Where does authenticity come into play?

    Through corrupting Y2K, I took time to reflect on integrity versus authenticity, as well as create some interesting glitched outputs. I also uncovered what may be the first audio that reveals what the Millennium Bug itself may have sounded like! Keen to hear it? Read on to find out more.

    #ac3 #Archives #audio #audiovisual #authenticity #av #Bash #checksums #Code4Lib #corruption #corruptionIndex #digipres #DigitalArchiving #digitalLiteracy #DigitalPreservation #diplomatics #FileFormats #flac #glitch #GlitchArt #glitchaudio #integrity #mp3 #sensitivityIndex #wav

  8. Version 1.3 of the automated subject subject indexing tool #Annif has been released!

    This release introduces support for the EstNLTK analyzer for better Estonian lemmatization 🇪🇪, optimizations to the MLLM backend, as well as maintenance and bug fixes, including better file permissions in multi-user environments.

    github.com/NatLibFi/Annif/rele

    #AI #machinelearning #opensource #code4lib #libraries #subjectindexing #SKOS #classification #Estonian #eesti

  9. Are you using #Annif for automated subject indexing or classification? Have you tried it out? Did you look at it but never got around to using it?

    If yes, please fill in this Annif user survey: forms.gle/P7jGoPMbEAJnD9zw9

    We want to hear your thoughts about Annif and how to make it better in the future!

    It should only take a few minutes. Deadline is November 30.

    #Annif #AI #subjectindexing #libraries #automation #SKOS #code4lib #survey

  10. What a great experience at #Code4Lib It was amazing to meet new and old friends and support this great event for the second year in a row. This year we also sponsored the Diversity Scholarship, and we presented a poster during the poster session!
    We are very proud and honored to be part of such an amazing community and we can't wait for Code4Lib 2025!

    #Openness #OpenScience #OpenSolutions #Future #Advancement

  11. Highly recommended read - make sure to take a minute and file through @mjgiarlo's thread (linked below 👇🏽) for a #code4lib member perspective on the recent court case of OCLC v. Anna's Archive ...

    And we couldn't agree more, this is just such a waste of energy, time, and resources - and to what end? To antagonise / alienate OCLC's customer base even more?

    This is exactly why we are working hard to establish an equitable, fully open alternative to #OAbooks metadata creation and dissemination - with all metadata records released under a #CC0 public domain dedication (as they should!) #MARC #ONIX #KBART #MetadataMatters #DataReuse #Right2Remix

    code4lib.social/@mjgiarlo/1124

  12. New in The Code4Lib Journal:
    Minyoung Chung and Phani Chaitanya Pendyala: Enhancing Serials Holdings Data: A Pymarc-Powered Clean-Up Project
    journal.code4lib.org/articles/
    #code4lib #python #MARC21 #MarcEdit

  13. Very clear and detailed presentation by @TinaTrillitzsch on how they switched their automated indexing from a legacy system to #Annif

    #swib23 #ai #code4lib #metadata #indexing #skos

  14. Writing a document for how we map the data from a MARC-21 record to a Solr document that is indexed and stored in the backend Apache Solr datastore. Having to explain the difference in an indexed field vs a stored field. Having to explain how copy fields work, why they are used, and what stemming is.

    I just want to write some code :(

    #marc #marc21 #code4lib

  15. Catching up on my edification. That includes this podcast. This recent episode is a terrific exploration of web archiving and digital preservation -- and how many related problems are ultimately people related. Recommended listening, folks!

    "This week we have two researchers to talk about web archiving, its politics, its goals, & how web archiving is often mobilized to address problems it really can’t help."

    librarypunk.gay/e/081-web-arch #digipres #digitalarchives #code4lib #LibraryPunk

  16. Release v0.6.0 of QA catalogue a metadata quality assessment tool for library catalogue records. The big thing: supports of PICA bibliographic metadata schema (a MARC alternative used in Germany, The Netherlands, and France).

    The developments was sponsored by Verbundzentrale (VZG) des Gemeinsamen Bibliotheksverbundes (GBV). Many-many thanks for @nichtich for the cooperation!

    Details, download links:
    github.com/pkiraly/metadata-qa

    #code4lib #metadata #metadataquality #libtech

  17. Release v0.6.0 of QA catalogue a metadata quality assessment tool for library catalogue records. The big thing: supports of PICA bibliographic metadata schema (a MARC alternative used in Germany, The Netherlands, and France).

    The developments was sponsored by Verbundzentrale (VZG) des Gemeinsamen Bibliotheksverbundes (GBV). Many-many thanks for @nichtich for the cooperation!

    Details, download links:
    github.com/pkiraly/metadata-qa

    #code4lib #metadata #metadataquality #libtech

  18. Release v0.6.0 of QA catalogue a metadata quality assessment tool for library catalogue records. The big thing: supports of PICA bibliographic metadata schema (a MARC alternative used in Germany, The Netherlands, and France).

    The developments was sponsored by Verbundzentrale (VZG) des Gemeinsamen Bibliotheksverbundes (GBV). Many-many thanks for @nichtich for the cooperation!

    Details, download links:
    github.com/pkiraly/metadata-qa

    #code4lib #metadata #metadataquality #libtech

  19. Release v0.6.0 of QA catalogue a metadata quality assessment tool for library catalogue records. The big thing: supports of PICA bibliographic metadata schema (a MARC alternative used in Germany, The Netherlands, and France).

    The developments was sponsored by Verbundzentrale (VZG) des Gemeinsamen Bibliotheksverbundes (GBV). Many-many thanks for @nichtich for the cooperation!

    Details, download links:
    github.com/pkiraly/metadata-qa

    #code4lib #metadata #metadataquality #libtech

  20. Release v0.6.0 of QA catalogue a metadata quality assessment tool for library catalogue records. The big thing: supports of PICA bibliographic metadata schema (a MARC alternative used in Germany, The Netherlands, and France).

    The developments was sponsored by Verbundzentrale (VZG) des Gemeinsamen Bibliotheksverbundes (GBV). Many-many thanks for @nichtich for the cooperation!

    Details, download links:
    github.com/pkiraly/metadata-qa

    #code4lib #metadata #metadataquality #libtech