home.social

#webarchiving — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #webarchiving, aggregated by home.social.

  1. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  2. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  3. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  4. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  5. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  6. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  7. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  8. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  9. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  10. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  11. Tom’s Hardware: Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom — Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots. “The internet is getting harder to archive because the AI boom has caused a storage crisis, with both NAND and mechanical […]

    https://rbfirehose.com/2026/05/09/toms-hardware-internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratosphe/
  12. We presented #CiVers at #CAA2026 in Vienna (Mar 31–Apr 4) with a poster! 🎉
    🔍 By combining #WebArchiving change detection, and #metadata extraction, CiVers enables reliable citation of versioned web pages using persistent identifiers (PIDs).
    🥅 Our goal: make web-based #research resources citable, traceable, and reproducible.
    Let’s connect! 🤲

    Photo Credits: Lisa Steinmann.

  13. A timely panel on why web archiving is a civic duty — researchers, archivists, activists and citizens discuss preserving our digital history. Practical tips, ethics, and why this matters for memory and democracy. Inspiring and actionable! #WebArchiving #DigitalPreservation #Archives #CivicDuty #OpenAccess #DigitalRights #InternetHistory #Archiving #English
    video.rhizome.org/videos/watch

  14. New-to-me, from Library of Congress: Preserving U.S. Indigenous Government Websites: From Directory to Digital Archive. “As a 2025 Junior Fellow, Maggie Jones helped build the United States Indigenous Government Websites Web Archive with the guidance of her mentor, Giselle Aviles. In this interview, they describe how the collection developed from a list of over 500 tribes and what that process […]

    https://rbfirehose.com/2026/03/07/preserving-u-s-indigenous-government-websites-from-directory-to-digital-archive-library-of-congress/
  15. Arndt, Tracy; Arndt, Natanael: How to describe the past Web? A data model for web archiving. SWIB25 - Semantic Web in Libraries, ZBW - Leibniz-Informationszentrum Wirtschaft et al., 2025. doi.org/10.5446/72405

    #webarchiving #linkedopendata

  16. Popular Science: The Internet Archive records its 1 trillionth website. “The Internet Archive—one of cyberspace’s most essential library projects—has achieved a feat that’s hard to even conceptualize. After nearly 30 years of painstaking work, the nonprofit has preserved its trillionth webpage.”

    https://rbfirehose.com/2026/02/23/popular-science-the-internet-archive-records-its-1-trillionth-website/
  17. Ars Technica: Wikipedia blacklists Archive.today, starts removing 695,000 archive links. “In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger […]

    https://rbfirehose.com/2026/02/21/ars-technica-wikipedia-blacklists-archive-today-starts-removing-695000-archive-links/
  18. #WaybackMachine Director Pushes Back on AI Scraping Fears Driving Archive Blocks
    blog.archive.org/2026/02/18/wa
    As reported by Nieman Lab last month, some major media organizations—including The #NewYorkTimes, #TheGuardian, and #Reddit—have started blocking the Wayback Machine from archiving their sites over unfounded concerns about AI scraping.
    Mike Masnick in #Techdirt explained why this is “a mistake we’re going to regret for generations.”
    limiting #webarchiving threatens our shared #digitalhistory.

  19. Library of Congress: From Print Volumes to Digital Scholarship: The Handbook of Latin American Studies Web Archive. “Since the 1930s, the Handbook of Latin American Studies has documented scholarship on Latin America and the Caribbean. In this interview, Tracy North describes how that long-standing mission now extends to web archiving, ensuring long-term access to web-based research materials. […]

    https://rbfirehose.com/2026/02/09/from-print-volumes-to-digital-scholarship-the-handbook-of-latin-american-studies-web-archive-library-of-congress/
  20. Alex Chan: Hard problems in social media archiving. “Institutional archiving has different constraints to individual collections – institutions serve a much wider audience, so their decisions need consistency and boundaries. My own scrapbook is tiny and personal, and comparing it alongside institutional efforts really highlights the differences and difficulties. It’s why I usually call it a […]

    https://rbfirehose.com/2025/12/15/alex-chan-hard-problems-in-social-media-archiving/

  21. Library of Congress Blogs: Where Science Meets Storytelling: Twelve Years of the Science Blogs Web Archive. ” More than a decade after its launch, the Science Blogs Web Archive continues to grow and evolve. In this interview, Jennifer ‘JJ’ Harbster reflects on building and maintaining the collection, while intern Yahir Brito brings a fresh perspective on updating and expanding it. Together, […]

    https://rbfirehose.com/2025/12/05/where-science-meets-storytelling-twelve-years-of-the-science-blogs-web-archive-library-of-congress-blogs/

  22. Common Crawl - Setting the Record Straight: Common Crawl’s Commitment to Transparency, Fair Use, and the Public Good commoncrawl.org/blog/setting-t… #AI #CommonCrawl #data #WebArchiving (wow, that Atlantic piece was bad, needing this rebuttal)

  23. 📣 New blog post! 📝

    October 14, we hosted our first #CiVers workshop at the @dai_weltweit in Berlin 🏛️ This was a great opportunity to exchange ideas on citing versioned web resources and managing research data in #archaeology and the #humanities

    Read more about what we discussed 👇
    🔗 dainst.org/blogs/noslug/253

    #Metadata #Research #OpenScience #DigitalPreservation #WebArchiving #DigitalHumanities

  24. #WAAM#WebArchiving Aix Marseille #AMU
    pba.mmsh.fr/?p=35306
    WAAM – Web Archiving Aix #Marseille est le nom donné à une instance d’archivage maintenue par le CEntre de formation et de soutien aux DOnnées de la REcherche #CEDRE à la demande du #WebLab.

    Cette plateforme permet de collecter des pages web ou des sites complets et d’en conserver une version archivée (fichier au format .wacz), ainsi que de partager en ligne ces versions archivées afin de pouvoir les « rejouer ».

  25. #WAAM#WebArchiving Aix Marseille #AMU
    pba.mmsh.fr/?p=35306
    WAAM – Web Archiving Aix #Marseille est le nom donné à une instance d’archivage maintenue par le CEntre de formation et de soutien aux DOnnées de la REcherche #CEDRE à la demande du #WebLab.

    Cette plateforme permet de collecter des pages web ou des sites complets et d’en conserver une version archivée (fichier au format .wacz), ainsi que de partager en ligne ces versions archivées afin de pouvoir les « rejouer ».

  26. #WAAM#WebArchiving Aix Marseille #AMU
    pba.mmsh.fr/?p=35306
    WAAM – Web Archiving Aix #Marseille est le nom donné à une instance d’archivage maintenue par le CEntre de formation et de soutien aux DOnnées de la REcherche #CEDRE à la demande du #WebLab.

    Cette plateforme permet de collecter des pages web ou des sites complets et d’en conserver une version archivée (fichier au format .wacz), ainsi que de partager en ligne ces versions archivées afin de pouvoir les « rejouer ».

  27. 📚 First publication from CiVers!

    🤔 How can we reliably cite resources of web-based research databases in archaeology and the humanities?

    💡 In our new article, we present the CiVers approach: creating versioned, citable web resources using Persistent Identifiers (PIDs).

    🧠 Read the full open-access paper here:
    🔗 doi.org/10.34780/6k764r03

    #CiVers #DigitalHumanities #WebArchiving #OpenScience #PID #DigitalPreservation

  28. 📣Start planning your proposal for #iipcWAC26, “Sustainable #WebArchiving,” today! 📣

    🗓️ Proposals due October 15
    🇧🇪 20-23 APR 2026 at KBR, Royal Library of Belgium

    For more info: netpreserve.org/ga2026/cfp/

    Need inspiration? Check out past presentations: youtube.com/@iipc8855/featured

    #webarchives | #DigitalPreservation | #DigitalHumanities | @webarchives

  29. 📢 Hello Mastodon! 👋
    We’re CiVers Citation of Versioned Web Pages by Persistent Identifier

    Web pages change. Links rot. Academic references break. We’re fixing that. 🛠️

    💻 CiVers develops software and methodologies to make web content reliably citable with PIDs and versioning

    🔗 DFG-funded @dfg_public project at the DAI Berlin @dai_weltweit with Heidelberg University Library @uniheidelberg, GBV @vzg_gbv and DataCite @datacite

    #WebArchiving #OpenScience #DigitalHumanities #PID #DataCite #CiVers

  30. What can hacked websites tell us about the history of political activism of the web? My latest article explores the political, cultural, and archival value of over 10,000 web defacements from attrition.org—now made available for research.

    🔗 journalofdigitalhistory.org/en

    #DigitalHistory #webarchiving #defacement

  31. [halshs-05113368] Neglect, Stammering, Focus: Processes of an #Archival Experience of #Archive Collections and #Audiovisual Projects at #AMU Posted on the Web Over the Past 30 Years
    shs.hal.science/halshs-0511336
    #webarchiving
    #soundarchives
    #resaw2025

  32. [halshs-05113368] Neglect, Stammering, Focus: Processes of an #Archival Experience of #Archive Collections and #Audiovisual Projects at #AMU Posted on the Web Over the Past 30 Years
    shs.hal.science/halshs-0511336
    #webarchiving
    #soundarchives
    #resaw2025

  33. [halshs-05113368] Neglect, Stammering, Focus: Processes of an #Archival Experience of #Archive Collections and #Audiovisual Projects at #AMU Posted on the Web Over the Past 30 Years
    shs.hal.science/halshs-0511336
    #webarchiving
    #soundarchives
    #resaw2025

  34. Looking after your URLs: tikalinkextract eight years on


    by @beet_keeper

    We might not have a second life, but what if I told you there was a second internet? Not the deep web, but another web that we engage with nearly every day?

    Think about it, that QR code you scanned for more information? That payment link you followed on your electricity bill? The website you’re told to visit at the end of a television ad?

    The antipodes of the internet are these terminal endpoints, material and not necessarily material objects that represent the end of the freely navigable web — the QR code on a concert poster is the web printed onto the physical world. There is every chance it will be scanned and followed by someone from a mobile device, but it’s a transient object, something that will exist for a short amount of time, and then disappear into the palimpsest of the poster board or wall it was pasted on until it eventually disappears.

    This is part of the materiality of the internet that has long fascinated me. Perhaps it comes from being a student of material culture, but if we look around, we see the Internet everywhere!

    #Archives #digipres #DigitalArchiving #digitalContinuity #DigitalPreservation #httpreserve #Memento #outreach #RobustLinks #RobustWebLinks #WebArchives #webArchiving

  35. Quicker, better, robuster,... this is ZIMit 2.0! Our scraper able to make an offline version of any Web site is only a few days away from its release! Stay tuned! github.com/openzim/zimit #webscraping #webarchiving #zim #offline #kiwix #warc

  36. Wow! #TIL about #ArchiveBox, your #selfhosted #alternativeTo @internetarchive!

    Runs on #Python (OS-packaged or #docker‬ed) and saves both single pages or whole website crawls in every format you could wish for:

    ✅ self-contained single-page HTML
    ✅ PDF
    ✅ PNG screenshot
    ✅ plaintext
    ✅ DOM-dump
    ✅ priv./publ. #archive
    ✅ media audio/video included (+yt-dlp)
    #WARC compat.

    🌐 archivebox.io
    📜 github.com/ArchiveBox/ArchiveB
    demo.archivebox.io

    #WebArchiving #WebCrawling #DigitalPreservation

  37. ✍️ Blog post: Software prototype #ArtDocArchive (work in progress)
    nullmuseum.hypotheses.org/602

    Summary: Art Doc Archive will be a set of tools for digital #archive care, for all the #artdocumentation on websites and #socialmedia. How to save this archival material and analyze it for #digitalarthistory ?

    The project is accompanied by a blog titled 𝙍𝙚𝙘𝙡𝙖𝙞𝙢 𝙮𝙤𝙪𝙧 𝘼𝙧𝙘𝙝𝙞𝙫𝙚, open source software prototyping happens until end of February 2023

    Follow our project blog at reclaim.hypotheses.org

    #webarchiving #digitalhumanities #dataviz #Datavisualization #mirrors #semanticdata #structureddata, #namedentityrecognition, #dataviz

  38. ✍️ Blog post: Software prototype #ArtDocArchive (work in progress)
    nullmuseum.hypotheses.org/602

    Summary: Art Doc Archive will be a set of tools for digital #archive care, for all the #artdocumentation on websites and #socialmedia. How to save this archival material and analyze it for #digitalarthistory ?

    The project is accompanied by a blog titled 𝙍𝙚𝙘𝙡𝙖𝙞𝙢 𝙮𝙤𝙪𝙧 𝘼𝙧𝙘𝙝𝙞𝙫𝙚, open source software prototyping happens until end of February 2023

    Follow our project blog at reclaim.hypotheses.org

    #webarchiving #digitalhumanities #dataviz #Datavisualization #mirrors #semanticdata #structureddata, #namedentityrecognition, #dataviz