home.social

#webarchiving — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #webarchiving, aggregated by home.social.

  1. “People aren’t sure what’s true, and what libraries are here for is to help with that.”

    Brewster Kahle, digital librarian of the Internet Archive, discusses the future of the #WaybackMachine in ABC Radio National (🇦🇺 Australia)’s “Wayback Machine: The internet’s archive in peril,” a look at how media companies are restricting the preservation of the web itself.

    🎧 Listen ⤵️
    abc.net.au/listen/programs/sun

    #InternetHistory #WebArchiving @abcaustraliarss @brewsterkahle

  2. "Common Crawl mirrors its monthly crawl archive to the Hugging Face Hub as a Storage Bucket. Alongside the raw pages, it now publishes the columnar URL index — one parquet row per crawled page (host, language, MIME type, fetch status, and a pointer to the page's bytes). That makes the whole crawl queryable without touching the petabytes of underlying WARCs."
    huggingface.co/spaces/davanstr
    #webarchiving

  3. NiemanLab: More than 340 local news outlets are limiting the Internet Archive’s access to their journalism. “Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today […]

    https://rbfirehose.com/2026/05/21/niemanlab-more-than-340-local-news-outlets-are-limiting-the-internet-archives-access-to-their-journalism/
  4. NiemanLab: More than 340 local news outlets are limiting the Internet Archive’s access to their journalism. “Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today […]

    https://rbfirehose.com/2026/05/21/niemanlab-more-than-340-local-news-outlets-are-limiting-the-internet-archives-access-to-their-journalism/
  5. NiemanLab: More than 340 local news outlets are limiting the Internet Archive’s access to their journalism. “Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today […]

    https://rbfirehose.com/2026/05/21/niemanlab-more-than-340-local-news-outlets-are-limiting-the-internet-archives-access-to-their-journalism/
  6. NiemanLab: More than 340 local news outlets are limiting the Internet Archive’s access to their journalism. “Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today […]

    https://rbfirehose.com/2026/05/21/niemanlab-more-than-340-local-news-outlets-are-limiting-the-internet-archives-access-to-their-journalism/
  7. NiemanLab: More than 340 local news outlets are limiting the Internet Archive’s access to their journalism. “Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today […]

    https://rbfirehose.com/2026/05/21/niemanlab-more-than-340-local-news-outlets-are-limiting-the-internet-archives-access-to-their-journalism/
  8. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  9. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  10. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  11. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  12. National Library of Finland: Principles for Finnish Web Archive content selection published. “The National Library of Finland is responsible for the diverse and representative preservation of online material. To make this work more transparent, we produced a document entitled Content selection for the Finnish Web Archive, outlining the principles for content selection in thematic and continuous […]

    https://rbfirehose.com/2026/05/13/national-library-of-finland-principles-for-finnish-web-archive-content-selection-published/
  13. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  14. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  15. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  16. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  17. The web never stands still 🌐 ... and neither do the challenges of preserving it.

    The #DPC is preparing for the return of its Web Archiving Special Interest Group (WA-SIG), bringing DPC Members together in a welcoming and transparent space where Members can exchange ideas, surface challenges, and learn from one another’s approaches.

    The renewed WA-SIG gets together on 7 July.

    Read more & join us 😊: dpconline.org/news/dpc-prepare

    #DigitalPreservation #Coalition #DPC #WebArchiving #Archives

  18. Tom’s Hardware: Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom — Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots. “The internet is getting harder to archive because the AI boom has caused a storage crisis, with both NAND and mechanical […]

    https://rbfirehose.com/2026/05/09/toms-hardware-internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratosphe/
  19. Tom’s Hardware: Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom — Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots. “The internet is getting harder to archive because the AI boom has caused a storage crisis, with both NAND and mechanical […]

    https://rbfirehose.com/2026/05/09/toms-hardware-internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratosphe/
  20. Tom’s Hardware: Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom — Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots. “The internet is getting harder to archive because the AI boom has caused a storage crisis, with both NAND and mechanical […]

    https://rbfirehose.com/2026/05/09/toms-hardware-internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratosphe/
  21. Tom’s Hardware: Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom — Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots. “The internet is getting harder to archive because the AI boom has caused a storage crisis, with both NAND and mechanical […]

    https://rbfirehose.com/2026/05/09/toms-hardware-internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratosphe/
  22. Tom’s Hardware: Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom — Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots. “The internet is getting harder to archive because the AI boom has caused a storage crisis, with both NAND and mechanical […]

    https://rbfirehose.com/2026/05/09/toms-hardware-internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratosphe/
  23. We presented #CiVers at #CAA2026 in Vienna (Mar 31–Apr 4) with a poster! 🎉
    🔍 By combining #WebArchiving change detection, and #metadata extraction, CiVers enables reliable citation of versioned web pages using persistent identifiers (PIDs).
    🥅 Our goal: make web-based #research resources citable, traceable, and reproducible.
    Let’s connect! 🤲

    Photo Credits: Lisa Steinmann.

  24. We presented #CiVers at #CAA2026 in Vienna (Mar 31–Apr 4) with a poster! 🎉
    🔍 By combining #WebArchiving change detection, and #metadata extraction, CiVers enables reliable citation of versioned web pages using persistent identifiers (PIDs).
    🥅 Our goal: make web-based #research resources citable, traceable, and reproducible.
    Let’s connect! 🤲

    Photo Credits: Lisa Steinmann.

  25. We presented #CiVers at #CAA2026 in Vienna (Mar 31–Apr 4) with a poster! 🎉
    🔍 By combining #WebArchiving change detection, and #metadata extraction, CiVers enables reliable citation of versioned web pages using persistent identifiers (PIDs).
    🥅 Our goal: make web-based #research resources citable, traceable, and reproducible.
    Let’s connect! 🤲

    Photo Credits: Lisa Steinmann.

  26. We presented #CiVers at #CAA2026 in Vienna (Mar 31–Apr 4) with a poster! 🎉
    🔍 By combining #WebArchiving change detection, and #metadata extraction, CiVers enables reliable citation of versioned web pages using persistent identifiers (PIDs).
    🥅 Our goal: make web-based #research resources citable, traceable, and reproducible.
    Let’s connect! 🤲

    Photo Credits: Lisa Steinmann.

  27. We presented #CiVers at #CAA2026 in Vienna (Mar 31–Apr 4) with a poster! 🎉
    🔍 By combining #WebArchiving change detection, and #metadata extraction, CiVers enables reliable citation of versioned web pages using persistent identifiers (PIDs).
    🥅 Our goal: make web-based #research resources citable, traceable, and reproducible.
    Let’s connect! 🤲

    Photo Credits: Lisa Steinmann.

  28. Ex-OB Dieter Reiters Instagram-Account ist wieder da. Warum? Damit das Stadtarchiv archivieren kann.

    Die @SZ_de schreibt: "Die sozialen Medien seien für Politiker „mittlerweile eine der bevorzugten Möglichkeiten, mit der Bevölkerung zu kommunizieren“, sagte der Leiter des Stadtarchivs, Daniel Baumann, der SZ. Wenn eines Tages die Geschichte von Reiters Amtszeit aufgearbeitet werde, (...) stelle ein solcher Account dafür einen wichtigen Baustein dar."

    sueddeutsche.de/muenchen/muenc

    #webarchiving

  29. Ex-OB Dieter Reiters Instagram-Account ist wieder da. Warum? Damit das Stadtarchiv archivieren kann.

    Die @SZ_de schreibt: "Die sozialen Medien seien für Politiker „mittlerweile eine der bevorzugten Möglichkeiten, mit der Bevölkerung zu kommunizieren“, sagte der Leiter des Stadtarchivs, Daniel Baumann, der SZ. Wenn eines Tages die Geschichte von Reiters Amtszeit aufgearbeitet werde, (...) stelle ein solcher Account dafür einen wichtigen Baustein dar."

    sueddeutsche.de/muenchen/muenc

    #webarchiving

  30. Ex-OB Dieter Reiters Instagram-Account ist wieder da. Warum? Damit das Stadtarchiv archivieren kann.

    Die @SZ_de schreibt: "Die sozialen Medien seien für Politiker „mittlerweile eine der bevorzugten Möglichkeiten, mit der Bevölkerung zu kommunizieren“, sagte der Leiter des Stadtarchivs, Daniel Baumann, der SZ. Wenn eines Tages die Geschichte von Reiters Amtszeit aufgearbeitet werde, (...) stelle ein solcher Account dafür einen wichtigen Baustein dar."

    sueddeutsche.de/muenchen/muenc

    #webarchiving

  31. Ex-OB Dieter Reiters Instagram-Account ist wieder da. Warum? Damit das Stadtarchiv archivieren kann.

    Die @SZ_de schreibt: "Die sozialen Medien seien für Politiker „mittlerweile eine der bevorzugten Möglichkeiten, mit der Bevölkerung zu kommunizieren“, sagte der Leiter des Stadtarchivs, Daniel Baumann, der SZ. Wenn eines Tages die Geschichte von Reiters Amtszeit aufgearbeitet werde, (...) stelle ein solcher Account dafür einen wichtigen Baustein dar."

    sueddeutsche.de/muenchen/muenc

    #webarchiving

  32. Ex-OB Dieter Reiters Instagram-Account ist wieder da. Warum? Damit das Stadtarchiv archivieren kann.

    Die @SZ_de schreibt: "Die sozialen Medien seien für Politiker „mittlerweile eine der bevorzugten Möglichkeiten, mit der Bevölkerung zu kommunizieren“, sagte der Leiter des Stadtarchivs, Daniel Baumann, der SZ. Wenn eines Tages die Geschichte von Reiters Amtszeit aufgearbeitet werde, (...) stelle ein solcher Account dafür einen wichtigen Baustein dar."

    sueddeutsche.de/muenchen/muenc

    #webarchiving

  33. Join DRI's Senior Software Engineer Kathryn Cassidy this week on Wed, April 15th for a tutorial and discussion on #WebArchiving with DRI. All skill levels welcome, no prior experience necessary!

    Register: dri.ie/events/web-archiving-wi

  34. Join DRI's Senior Software Engineer Kathryn Cassidy this week on Wed, April 15th for a tutorial and discussion on #WebArchiving with DRI. All skill levels welcome, no prior experience necessary!

    Register: dri.ie/events/web-archiving-wi