#brewsterkahle — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #brewsterkahle, aggregated by home.social.
-
Innovation comes when there is a tender balance between chaos and bureaucracy, and not too much of either.
-- Brewster Kahle⬆ #Wisdom #Quotes #BrewsterKahle #Bureaucracy #Chaos #Innovation
⬇ #Photography #Panorama #Mangrove #RabbitKey #Everglades #Florida
-
Innovation comes when there is a tender balance between chaos and bureaucracy, and not too much of either.
-- Brewster Kahle⬆ #Wisdom #Quotes #BrewsterKahle #Bureaucracy #Chaos #Innovation
⬇ #Photography #Panorama #Mangrove #RabbitKey #Everglades #Florida
-
Innovation comes when there is a tender balance between chaos and bureaucracy, and not too much of either.
-- Brewster Kahle⬆ #Wisdom #Quotes #BrewsterKahle #Bureaucracy #Chaos #Innovation
⬇ #Photography #Panorama #Mangrove #RabbitKey #Everglades #Florida
-
Innovation comes when there is a tender balance between chaos and bureaucracy, and not too much of either.
-- Brewster Kahle⬆ #Wisdom #Quotes #BrewsterKahle #Bureaucracy #Chaos #Innovation
⬇ #Photography #Panorama #Mangrove #RabbitKey #Everglades #Florida
-
Innovation comes when there is a tender balance between chaos and bureaucracy, and not too much of either.
-- Brewster Kahle⬆ #Wisdom #Quotes #BrewsterKahle #Bureaucracy #Chaos #Innovation
⬇ #Photography #Panorama #Mangrove #RabbitKey #Everglades #Florida
-
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting – HackerNoon
New Story, 1,290 reads
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting
by Bruce Li, January 12th, 2026
A Comprehensive Engineering and Operational Analysis of the Internet Archive
Introduction: The Hum of History in the Fog
If you stand quietly in the nave of the former Christian Science church on Funston Avenue in San Francisco’s Richmond District, you can hear the sound of the internet breathing. It is not the chaotic screech of a dial-up modem or the ping of a notification, but a steady, industrial hum—a low-frequency thrum generated by hundreds of spinning hard drives and the high-velocity fans that cool them. This is the headquarters of the Internet Archive, a non-profit library that has taken on the Sisyphean task of recording the entire digital history of human civilization.
Internet Archive’s office in San Francisco
Here, amidst the repurposed neoclassical columns and wooden pews of a building constructed to worship a different kind of permanence, lies the physical manifestation of the “virtual” world. We tend to think of the internet as an ethereal cloud, a place without geography or mass. But in this building, the internet has weight. It has heat. It requires electricity, maintenance, and a constant battle against the second law of thermodynamics. As of late 2025, this machine—collectively known as the Wayback Machine—has archived over one trillion web pages.1 It holds 99 petabytes of unique data, a number that expands to over 212 petabytes when accounting for backups and redundancy.3
The scale of the operation is staggering, but the engineering challenge is even deeper. How do you build a machine that can ingest the sprawling, dynamic, and ever-changing World Wide Web in real-time? How do you store that data for centuries when the average hard drive lasts only a few years? And perhaps most critically, how do you pay for the electricity, the bandwidth, and the legal defense funds required to keep the lights on in an era where copyright law and digital preservation are locked in a high-stakes collision?
This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the “Decentralized Web” (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed.5
To understand the Archive is to understand the physical reality of digital memory. It is a story of 20,000 hard drives, 45 miles of cabling, and a vision that began in 1996 with a simple, audacious goal: “Universal Access to All Knowledge”.7
Part I: The Thermodynamics of Memory
The PetaBox Architecture: Engineering for Density and Heat
The heart of the Internet Archive is the PetaBox, a storage server custom-designed by the Archive’s staff to solve a specific problem: storing massive amounts of data with minimal power consumption and heat generation. In the early 2000s, off-the-shelf enterprise storage solutions from giants like EMC or NetApp were prohibitively expensive and power-hungry. They were designed for high-speed transactional data—like banking systems or stock exchanges—where milliseconds of latency matter. Archival storage, however, has different requirements. It needs to be dense, cheap, and low-power.
Brewster Kahle, founder of Internet Archive (with the PetaBox behind him)
Brewster Kahle, the Archive’s founder and a computer engineer who had previously founded the supercomputer company Thinking Machines, approached the problem with a different philosophy. Instead of high-performance RAID arrays, the Archive built the PetaBox using consumer-grade parts. The design philosophy was radical for its time: use “Just a Bunch of Disks” (JBOD) rather than expensive RAID controllers, and handle data redundancy via software rather than hardware.
Editor’s Note: Read the rest of the story, at the below link.
Continue/Read Original Article Here: The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting | HackerNoon
Tags: Architecture, Brewster Kahle, California, Fight Against Forgetting, HackerNoon, Internet Archive, Long Now, Memory, PetaBox, San Francisco, Storage, World Wide Web, WWW
#Architecture #BrewsterKahle #California #FightAgainstForgetting #HackerNoon #InternetArchive #LongNow #Memory #PetaBox #SanFrancisco #Storage #WorldWideWeb #WWW -
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting – HackerNoon
New Story, 1,290 reads
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting
by Bruce Li, January 12th, 2026
A Comprehensive Engineering and Operational Analysis of the Internet Archive
Introduction: The Hum of History in the Fog
If you stand quietly in the nave of the former Christian Science church on Funston Avenue in San Francisco’s Richmond District, you can hear the sound of the internet breathing. It is not the chaotic screech of a dial-up modem or the ping of a notification, but a steady, industrial hum—a low-frequency thrum generated by hundreds of spinning hard drives and the high-velocity fans that cool them. This is the headquarters of the Internet Archive, a non-profit library that has taken on the Sisyphean task of recording the entire digital history of human civilization.
Internet Archive’s office in San Francisco
Here, amidst the repurposed neoclassical columns and wooden pews of a building constructed to worship a different kind of permanence, lies the physical manifestation of the “virtual” world. We tend to think of the internet as an ethereal cloud, a place without geography or mass. But in this building, the internet has weight. It has heat. It requires electricity, maintenance, and a constant battle against the second law of thermodynamics. As of late 2025, this machine—collectively known as the Wayback Machine—has archived over one trillion web pages.1 It holds 99 petabytes of unique data, a number that expands to over 212 petabytes when accounting for backups and redundancy.3
The scale of the operation is staggering, but the engineering challenge is even deeper. How do you build a machine that can ingest the sprawling, dynamic, and ever-changing World Wide Web in real-time? How do you store that data for centuries when the average hard drive lasts only a few years? And perhaps most critically, how do you pay for the electricity, the bandwidth, and the legal defense funds required to keep the lights on in an era where copyright law and digital preservation are locked in a high-stakes collision?
This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the “Decentralized Web” (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed.5
To understand the Archive is to understand the physical reality of digital memory. It is a story of 20,000 hard drives, 45 miles of cabling, and a vision that began in 1996 with a simple, audacious goal: “Universal Access to All Knowledge”.7
Part I: The Thermodynamics of Memory
The PetaBox Architecture: Engineering for Density and Heat
The heart of the Internet Archive is the PetaBox, a storage server custom-designed by the Archive’s staff to solve a specific problem: storing massive amounts of data with minimal power consumption and heat generation. In the early 2000s, off-the-shelf enterprise storage solutions from giants like EMC or NetApp were prohibitively expensive and power-hungry. They were designed for high-speed transactional data—like banking systems or stock exchanges—where milliseconds of latency matter. Archival storage, however, has different requirements. It needs to be dense, cheap, and low-power.
Brewster Kahle, founder of Internet Archive (with the PetaBox behind him)
Brewster Kahle, the Archive’s founder and a computer engineer who had previously founded the supercomputer company Thinking Machines, approached the problem with a different philosophy. Instead of high-performance RAID arrays, the Archive built the PetaBox using consumer-grade parts. The design philosophy was radical for its time: use “Just a Bunch of Disks” (JBOD) rather than expensive RAID controllers, and handle data redundancy via software rather than hardware.
Editor’s Note: Read the rest of the story, at the below link.
Continue/Read Original Article Here: The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting | HackerNoon
Tags: Architecture, Brewster Kahle, California, Fight Against Forgetting, HackerNoon, Internet Archive, Long Now, Memory, PetaBox, San Francisco, Storage, World Wide Web, WWW
#Architecture #BrewsterKahle #California #FightAgainstForgetting #HackerNoon #InternetArchive #LongNow #Memory #PetaBox #SanFrancisco #Storage #WorldWideWeb #WWW -
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting – HackerNoon
New Story, 1,290 reads
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting
by Bruce Li, January 12th, 2026
A Comprehensive Engineering and Operational Analysis of the Internet Archive
Introduction: The Hum of History in the Fog
If you stand quietly in the nave of the former Christian Science church on Funston Avenue in San Francisco’s Richmond District, you can hear the sound of the internet breathing. It is not the chaotic screech of a dial-up modem or the ping of a notification, but a steady, industrial hum—a low-frequency thrum generated by hundreds of spinning hard drives and the high-velocity fans that cool them. This is the headquarters of the Internet Archive, a non-profit library that has taken on the Sisyphean task of recording the entire digital history of human civilization.
Internet Archive’s office in San Francisco
Here, amidst the repurposed neoclassical columns and wooden pews of a building constructed to worship a different kind of permanence, lies the physical manifestation of the “virtual” world. We tend to think of the internet as an ethereal cloud, a place without geography or mass. But in this building, the internet has weight. It has heat. It requires electricity, maintenance, and a constant battle against the second law of thermodynamics. As of late 2025, this machine—collectively known as the Wayback Machine—has archived over one trillion web pages.1 It holds 99 petabytes of unique data, a number that expands to over 212 petabytes when accounting for backups and redundancy.3
The scale of the operation is staggering, but the engineering challenge is even deeper. How do you build a machine that can ingest the sprawling, dynamic, and ever-changing World Wide Web in real-time? How do you store that data for centuries when the average hard drive lasts only a few years? And perhaps most critically, how do you pay for the electricity, the bandwidth, and the legal defense funds required to keep the lights on in an era where copyright law and digital preservation are locked in a high-stakes collision?
This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the “Decentralized Web” (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed.5
To understand the Archive is to understand the physical reality of digital memory. It is a story of 20,000 hard drives, 45 miles of cabling, and a vision that began in 1996 with a simple, audacious goal: “Universal Access to All Knowledge”.7
Part I: The Thermodynamics of Memory
The PetaBox Architecture: Engineering for Density and Heat
The heart of the Internet Archive is the PetaBox, a storage server custom-designed by the Archive’s staff to solve a specific problem: storing massive amounts of data with minimal power consumption and heat generation. In the early 2000s, off-the-shelf enterprise storage solutions from giants like EMC or NetApp were prohibitively expensive and power-hungry. They were designed for high-speed transactional data—like banking systems or stock exchanges—where milliseconds of latency matter. Archival storage, however, has different requirements. It needs to be dense, cheap, and low-power.
Brewster Kahle, founder of Internet Archive (with the PetaBox behind him)
Brewster Kahle, the Archive’s founder and a computer engineer who had previously founded the supercomputer company Thinking Machines, approached the problem with a different philosophy. Instead of high-performance RAID arrays, the Archive built the PetaBox using consumer-grade parts. The design philosophy was radical for its time: use “Just a Bunch of Disks” (JBOD) rather than expensive RAID controllers, and handle data redundancy via software rather than hardware.
Editor’s Note: Read the rest of the story, at the below link.
Continue/Read Original Article Here: The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting | HackerNoon
Tags: Architecture, Brewster Kahle, California, Fight Against Forgetting, HackerNoon, Internet Archive, Long Now, Memory, PetaBox, San Francisco, Storage, World Wide Web, WWW
#Architecture #BrewsterKahle #California #FightAgainstForgetting #HackerNoon #InternetArchive #LongNow #Memory #PetaBox #SanFrancisco #Storage #WorldWideWeb #WWW -
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting – HackerNoon
New Story, 1,290 reads
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting
by Bruce Li, January 12th, 2026
A Comprehensive Engineering and Operational Analysis of the Internet Archive
Introduction: The Hum of History in the Fog
If you stand quietly in the nave of the former Christian Science church on Funston Avenue in San Francisco’s Richmond District, you can hear the sound of the internet breathing. It is not the chaotic screech of a dial-up modem or the ping of a notification, but a steady, industrial hum—a low-frequency thrum generated by hundreds of spinning hard drives and the high-velocity fans that cool them. This is the headquarters of the Internet Archive, a non-profit library that has taken on the Sisyphean task of recording the entire digital history of human civilization.
Internet Archive’s office in San Francisco
Here, amidst the repurposed neoclassical columns and wooden pews of a building constructed to worship a different kind of permanence, lies the physical manifestation of the “virtual” world. We tend to think of the internet as an ethereal cloud, a place without geography or mass. But in this building, the internet has weight. It has heat. It requires electricity, maintenance, and a constant battle against the second law of thermodynamics. As of late 2025, this machine—collectively known as the Wayback Machine—has archived over one trillion web pages.1 It holds 99 petabytes of unique data, a number that expands to over 212 petabytes when accounting for backups and redundancy.3
The scale of the operation is staggering, but the engineering challenge is even deeper. How do you build a machine that can ingest the sprawling, dynamic, and ever-changing World Wide Web in real-time? How do you store that data for centuries when the average hard drive lasts only a few years? And perhaps most critically, how do you pay for the electricity, the bandwidth, and the legal defense funds required to keep the lights on in an era where copyright law and digital preservation are locked in a high-stakes collision?
This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the “Decentralized Web” (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed.5
To understand the Archive is to understand the physical reality of digital memory. It is a story of 20,000 hard drives, 45 miles of cabling, and a vision that began in 1996 with a simple, audacious goal: “Universal Access to All Knowledge”.7
Part I: The Thermodynamics of Memory
The PetaBox Architecture: Engineering for Density and Heat
The heart of the Internet Archive is the PetaBox, a storage server custom-designed by the Archive’s staff to solve a specific problem: storing massive amounts of data with minimal power consumption and heat generation. In the early 2000s, off-the-shelf enterprise storage solutions from giants like EMC or NetApp were prohibitively expensive and power-hungry. They were designed for high-speed transactional data—like banking systems or stock exchanges—where milliseconds of latency matter. Archival storage, however, has different requirements. It needs to be dense, cheap, and low-power.
Brewster Kahle, founder of Internet Archive (with the PetaBox behind him)
Brewster Kahle, the Archive’s founder and a computer engineer who had previously founded the supercomputer company Thinking Machines, approached the problem with a different philosophy. Instead of high-performance RAID arrays, the Archive built the PetaBox using consumer-grade parts. The design philosophy was radical for its time: use “Just a Bunch of Disks” (JBOD) rather than expensive RAID controllers, and handle data redundancy via software rather than hardware.
Editor’s Note: Read the rest of the story, at the below link.
Continue/Read Original Article Here: The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting | HackerNoon
#Architecture #BrewsterKahle #California #FightAgainstForgetting #HackerNoon #InternetArchive #LongNow #Memory #PetaBox #SanFrancisco #Storage #WorldWideWeb #WWW -
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting – HackerNoon
New Story, 1,290 reads
The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting
by Bruce Li, January 12th, 2026
A Comprehensive Engineering and Operational Analysis of the Internet Archive
Introduction: The Hum of History in the Fog
If you stand quietly in the nave of the former Christian Science church on Funston Avenue in San Francisco’s Richmond District, you can hear the sound of the internet breathing. It is not the chaotic screech of a dial-up modem or the ping of a notification, but a steady, industrial hum—a low-frequency thrum generated by hundreds of spinning hard drives and the high-velocity fans that cool them. This is the headquarters of the Internet Archive, a non-profit library that has taken on the Sisyphean task of recording the entire digital history of human civilization.
Internet Archive’s office in San Francisco
Here, amidst the repurposed neoclassical columns and wooden pews of a building constructed to worship a different kind of permanence, lies the physical manifestation of the “virtual” world. We tend to think of the internet as an ethereal cloud, a place without geography or mass. But in this building, the internet has weight. It has heat. It requires electricity, maintenance, and a constant battle against the second law of thermodynamics. As of late 2025, this machine—collectively known as the Wayback Machine—has archived over one trillion web pages.1 It holds 99 petabytes of unique data, a number that expands to over 212 petabytes when accounting for backups and redundancy.3
The scale of the operation is staggering, but the engineering challenge is even deeper. How do you build a machine that can ingest the sprawling, dynamic, and ever-changing World Wide Web in real-time? How do you store that data for centuries when the average hard drive lasts only a few years? And perhaps most critically, how do you pay for the electricity, the bandwidth, and the legal defense funds required to keep the lights on in an era where copyright law and digital preservation are locked in a high-stakes collision?
This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the “Decentralized Web” (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed.5
To understand the Archive is to understand the physical reality of digital memory. It is a story of 20,000 hard drives, 45 miles of cabling, and a vision that began in 1996 with a simple, audacious goal: “Universal Access to All Knowledge”.7
Part I: The Thermodynamics of Memory
The PetaBox Architecture: Engineering for Density and Heat
The heart of the Internet Archive is the PetaBox, a storage server custom-designed by the Archive’s staff to solve a specific problem: storing massive amounts of data with minimal power consumption and heat generation. In the early 2000s, off-the-shelf enterprise storage solutions from giants like EMC or NetApp were prohibitively expensive and power-hungry. They were designed for high-speed transactional data—like banking systems or stock exchanges—where milliseconds of latency matter. Archival storage, however, has different requirements. It needs to be dense, cheap, and low-power.
Brewster Kahle, founder of Internet Archive (with the PetaBox behind him)
Brewster Kahle, the Archive’s founder and a computer engineer who had previously founded the supercomputer company Thinking Machines, approached the problem with a different philosophy. Instead of high-performance RAID arrays, the Archive built the PetaBox using consumer-grade parts. The design philosophy was radical for its time: use “Just a Bunch of Disks” (JBOD) rather than expensive RAID controllers, and handle data redundancy via software rather than hardware.
Editor’s Note: Read the rest of the story, at the below link.
Continue/Read Original Article Here: The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting | HackerNoon
#Architecture #BrewsterKahle #California #FightAgainstForgetting #HackerNoon #InternetArchive #LongNow #Memory #PetaBox #SanFrancisco #Storage #WorldWideWeb #WWW -
Wayback Machine erreicht Billionen-Marke: 5 Fakten zur lebenden Geschichte des Internets
Die Wayback Machine gilt als praktisches Werkzeug, doch ihr tatsächlicher Umfang und ihre Bedeutung sind vielen nicht bewusst. Ein Bericht von CNN
https://www.apfeltalk.de/magazin/news/wayback-machine-erreicht-billionen-marke-5-fakten-zur-lebenden-geschichte-des-internets/
#News #Tellerrand #Archivierung #BrewsterKahle #InternetArchive #Internetgeschichte #Medien #WaybackMachine #Webarchiv #Webseiten -
Wayback Machine erreicht Billionen-Marke: 5 Fakten zur lebenden Geschichte des Internets
Die Wayback Machine gilt als praktisches Werkzeug, doch ihr tatsächlicher Umfang und ihre Bedeutung sind vielen nicht bewusst. Ein Bericht von CNN
https://www.apfeltalk.de/magazin/news/wayback-machine-erreicht-billionen-marke-5-fakten-zur-lebenden-geschichte-des-internets/
#News #Tellerrand #Archivierung #BrewsterKahle #InternetArchive #Internetgeschichte #Medien #WaybackMachine #Webarchiv #Webseiten -
Wayback Machine erreicht Billionen-Marke: 5 Fakten zur lebenden Geschichte des Internets
Die Wayback Machine gilt als praktisches Werkzeug, doch ihr tatsächlicher Umfang und ihre Bedeutung sind vielen nicht bewusst. Ein Bericht von CNN
https://www.apfeltalk.de/magazin/news/wayback-machine-erreicht-billionen-marke-5-fakten-zur-lebenden-geschichte-des-internets/
#News #Tellerrand #Archivierung #BrewsterKahle #InternetArchive #Internetgeschichte #Medien #WaybackMachine #Webarchiv #Webseiten -
Wayback Machine erreicht Billionen-Marke: 5 Fakten zur lebenden Geschichte des Internets
Die Wayback Machine gilt als praktisches Werkzeug, doch ihr tatsächlicher Umfang und ihre Bedeutung sind vielen nicht bewusst. Ein Bericht von CNN
https://www.apfeltalk.de/magazin/news/wayback-machine-erreicht-billionen-marke-5-fakten-zur-lebenden-geschichte-des-internets/
#News #Tellerrand #Archivierung #BrewsterKahle #InternetArchive #Internetgeschichte #Medien #WaybackMachine #Webarchiv #Webseiten -
Wayback Machine erreicht Billionen-Marke: 5 Fakten zur lebenden Geschichte des Internets
Die Wayback Machine gilt als praktisches Werkzeug, doch ihr tatsächlicher Umfang und ihre Bedeutung sind vielen nicht bewusst. Ein Bericht von CNN
https://www.apfeltalk.de/magazin/news/wayback-machine-erreicht-billionen-marke-5-fakten-zur-lebenden-geschichte-des-internets/
#News #Tellerrand #Archivierung #BrewsterKahle #InternetArchive #Internetgeschichte #Medien #WaybackMachine #Webarchiv #Webseiten -
Inside the old church where one trillion webpages are stored – CNN Business
Inside the old church where one trillion webpages are being saved
By Hadas Gold, Updated 23 hr ago
See inside the old San Francisco church that houses nearly all of the internet’s history…
San Francisco — Just blocks from the Presidio of San Francisco, the national park at the base of the Golden Gate Bridge, stands a gleaming white building, its façade adorned with eight striking gothic columns.
But what was once the home of a Christian Scientist church, is now the holy grail of Internet history — the Internet Archive, a non-profit library run by a group of software engineers and librarians, who for nearly 30 years have been saving the web one page at a time.
Inside the stained-glass-adorned sanctuary, the sounds of church sermons have been replaced by the hum of servers, where the Internet Archive’s Wayback Machine preserves web pages.
The Wayback Machine, a tool used by millions every day, has proven critical for academics and journalists searching for historical information on what corporations, people and governments have published online in the past, long after their websites have been updated or changed.
For many, the Wayback Machine is like a living history of the internet, and it just logged its trillionth page last month.
Archiving the web is more important and more challenging than ever before. The White House in January ordered vast amounts of government webpages to be taken down. Meanwhile, artificial intelligence is blurring the line between what’s real and what’s artificially generated — in some ways replacing the need to visit websites entirely. And more of the internet is now hidden behind paywalls or tucked in conversations with AI chatbots.
It’s the Internet Archive’s job to figure out how to preserve it all.
The Internet Archive also preserves music, television, newspapers, videogames and books, which archivists digitize page by page using bespoke machines. CNN“We are here to try to provide a record of what happened, so that people can learn and build on that to build a better future, or to build new ideas that are worthy of being in the (Internet Archive’s) library,” said Internet Archive founder Brewster Kahle.
The internet’s library
Kahle created the archive in 1996 when a year’s worth of saved pages could fit on about 2 terabytes worth of hard drives, the amount of storage you can get today in an iPhone. Now, the archive is saving closer to 150 terabytes, or hundreds of millions worth of web pages, per day.
Kahle is the driving force and personality behind the archive, with the exuberance and energy of your favorite science teacher and like an evangelist whose religion is libraries and technology. Sitting for an interview on the original wooden pews of the church, Kahle said he was inspired to purchase the building because it resembles the group’s logo. But more importantly, he said it’s a symbol of permanence and a reference to the Library of Alexandria in Egypt.
“That was the first time somebody tried to go and collect everything ever written by humans,” Kahle said. “Of course, now that place is the internet, and the Internet Archive serves the whole internet as a library.”
Brewster Kahle created the archive in 1996 when a year’s worth of saved pages could fit on about 2 terabytes worth of hard drives, the amount of storage you can get today in an iPhone. CNNThe Wayback Machine tool does more than just screenshot the page. It also saves the technical architecture — the HTML, CSS, JavaScript codes and more — so that it can attempt to “replay the page as it existed” even if the server is no longer functioning, said Wayback Machine Director Mark Graham.
The rise of artificial intelligence and AI chatbots means the Internet Archive is changing how it records the history of the internet. In addition to web pages, the Internet Archive now captures AI-generated content, like ChatGPT answers and those summaries that appear at the top of Google search results.
Referred by: Library Link of the Day
http://www.tk421.net/librarylink/ (archive, rss, subscribe options)Continue/Read Original Article Here: Inside the old church where one trillion webpages are stored | CNN Business
#archivists #bespokeMachines #brewsterKahle #cnn #cnnBusiness #digitizeContent #hadasGold #holyGrail #internetArchive #libraryLinkOfTheDay #oldChurch #preservation #presidioOfSanFrancisco #sanFrancisco #theInternetsLibrary #waybackMachine
-
Inside the old church where one trillion webpages are stored – CNN Business
Inside the old church where one trillion webpages are being saved
By Hadas Gold, Updated 23 hr ago
See inside the old San Francisco church that houses nearly all of the internet’s history…
San Francisco — Just blocks from the Presidio of San Francisco, the national park at the base of the Golden Gate Bridge, stands a gleaming white building, its façade adorned with eight striking gothic columns.
But what was once the home of a Christian Scientist church, is now the holy grail of Internet history — the Internet Archive, a non-profit library run by a group of software engineers and librarians, who for nearly 30 years have been saving the web one page at a time.
Inside the stained-glass-adorned sanctuary, the sounds of church sermons have been replaced by the hum of servers, where the Internet Archive’s Wayback Machine preserves web pages.
The Wayback Machine, a tool used by millions every day, has proven critical for academics and journalists searching for historical information on what corporations, people and governments have published online in the past, long after their websites have been updated or changed.
For many, the Wayback Machine is like a living history of the internet, and it just logged its trillionth page last month.
Archiving the web is more important and more challenging than ever before. The White House in January ordered vast amounts of government webpages to be taken down. Meanwhile, artificial intelligence is blurring the line between what’s real and what’s artificially generated — in some ways replacing the need to visit websites entirely. And more of the internet is now hidden behind paywalls or tucked in conversations with AI chatbots.
It’s the Internet Archive’s job to figure out how to preserve it all.
The Internet Archive also preserves music, television, newspapers, videogames and books, which archivists digitize page by page using bespoke machines. CNN“We are here to try to provide a record of what happened, so that people can learn and build on that to build a better future, or to build new ideas that are worthy of being in the (Internet Archive’s) library,” said Internet Archive founder Brewster Kahle.
The internet’s library
Kahle created the archive in 1996 when a year’s worth of saved pages could fit on about 2 terabytes worth of hard drives, the amount of storage you can get today in an iPhone. Now, the archive is saving closer to 150 terabytes, or hundreds of millions worth of web pages, per day.
Kahle is the driving force and personality behind the archive, with the exuberance and energy of your favorite science teacher and like an evangelist whose religion is libraries and technology. Sitting for an interview on the original wooden pews of the church, Kahle said he was inspired to purchase the building because it resembles the group’s logo. But more importantly, he said it’s a symbol of permanence and a reference to the Library of Alexandria in Egypt.
“That was the first time somebody tried to go and collect everything ever written by humans,” Kahle said. “Of course, now that place is the internet, and the Internet Archive serves the whole internet as a library.”
Brewster Kahle created the archive in 1996 when a year’s worth of saved pages could fit on about 2 terabytes worth of hard drives, the amount of storage you can get today in an iPhone. CNNThe Wayback Machine tool does more than just screenshot the page. It also saves the technical architecture — the HTML, CSS, JavaScript codes and more — so that it can attempt to “replay the page as it existed” even if the server is no longer functioning, said Wayback Machine Director Mark Graham.
The rise of artificial intelligence and AI chatbots means the Internet Archive is changing how it records the history of the internet. In addition to web pages, the Internet Archive now captures AI-generated content, like ChatGPT answers and those summaries that appear at the top of Google search results.
Referred by: Library Link of the Day
http://www.tk421.net/librarylink/ (archive, rss, subscribe options)Continue/Read Original Article Here: Inside the old church where one trillion webpages are stored | CNN Business
Tags: Archivists, Bespoke Machines, Brewster Kahle, CNN, CNN Business, Digitize Content, Hadas Gold, Holy Grail, Internet Archive, Library Link of the Day, Old Church, Preservation, Presidio of San Francisco, San Francisco, The Internet's Library, Wayback Machine#archivists #bespokeMachines #brewsterKahle #cnn #cnnBusiness #digitizeContent #hadasGold #holyGrail #internetArchive #libraryLinkOfTheDay #oldChurch #preservation #presidioOfSanFrancisco #sanFrancisco #theInternetsLibrary #waybackMachine
-
Celebrating Sir Tim Berners-Lee, 2025 Internet Archive Hero Award Recipient -Internet Archive Blogs
Updates from the Internet Archive
Celebrating Sir Tim Berners-Lee, 2025 Internet Archive Hero Award Recipient
Posted on November 5, 2025 by Chris Freeland
Brewster Kahle (left), Internet Archive’s founder and digital librarian, presents Sir Tim Berners-Lee (right), inventor of the World Wide Web, with the Internet Archive Hero Award during a discussion hosted by the Commonwealth Club of California.In celebrating 1 trillion web pages archived, the Internet Archive is proud to honor the visionary who made it all possible. As announced in The New Yorker, the 2025 Internet Archive Hero Award was presented to Sir Tim Berners-Lee, the inventor of the World Wide Web. Sir Tim’s groundbreaking work opened the door to a connected world and laid the foundation for our shared digital history.
Sir Tim was presented the award during a discussion at the Commonwealth Club of California on October 9. The conversation, “Building and Preserving the Web: A Conversation with Sir Tim Berners-Lee and Brewster Kahle,” was guided by Lauren Goode (Wired), and is now available for listening & download as an episode of the Future Knowledge podcast.
Listen to Sir Tim Berners-Lee and Brewster Kahle: https://share.transistor.fm/e/ce6b83bd
Sir Tim’s invention transformed how humanity shares knowledge, and his ongoing advocacy for an open and accessible web that empowers individuals continues to inspire us. We’re thrilled to recognize his enduring contributions as we mark this historic achievement for the web.
Watch the video from our celebration on October 22: https://archive.org/embed/sir-tim-berners-lee-internet-archive-hero-award-2025
The Internet Archive Hero Award is an annual award that recognizes those who have exhibited leadership in making information available for digital learners all over the world. Previous recipients have included the island nation of Aruba, public information advocate Carl Malamud, copyright expert Michelle Wu, and the Grateful Dead. Posted in News, Wayback Machine – Web Archive | Tagged hero award, Wayback1T | Leave a reply
About Chris Freeland
Chris Freeland is the Director of Library Services at Internet Archive.
View all posts by Chris Freeland →
Editor’s Note: The featured image at top of the post is via WP AI.
Continue/Read Original Article Here: Celebrating Sir Tim Berners-Lee, 2025 Internet Archive Hero Award Recipient | Internet Archive Blogs
#1TrillionWebPagesArchived #2025 #BrewsterKahle #ChrisFreeland #CommonwealthClubOfCalifornia #DigitalLibrarian #Director #HeroAward #InternetArchive #InternetArchiveBlog #LibraryServices #SirTimBernersLee #WorldWideWeb
-
Weekly output: phone plans, Nvidia keynote, passkey adoption, Bending Spoons buys AOL, SpaceX simplifying Starship lander, Internet luminaries on the open Web
This is not going to be a great week for normal sleep cycles: Tuesday, I will wake up at around 4 a.m. to spend a 15-plus hour shift working as an election officer for Arlington, and then Wednesday I’m off to Dulles Airport for this year’s final business trip across the Atlantic. I’m departing for Web Summit in Lisbon several days early because the organizers of another conference, the Mozilla Festival, offered a press pass and a travel stipend to cover that event in Barcelona. I’ve heard good things about this conference over the years, so accepting an invitation to spend a few days in one of my favorite cities in Europe was an easy call.
In addition to what you see below, Patreon readers got a detailed recap of how this past week’s event-packed schedule left its own series of dents in my calendar.
10/28/2025: The Best Cell Phone Plans, Wirecutter
This was going to be a modest update to the guide that I’ve been maintaining since 2014, but T-Mobile jacking up prices while AT&T and Verizon inflicted more modest rate hikes led to us dethroning T-Mo on cost grounds and handing our “for most people” pick to AT&T, which has advanced its own 5G network considerably.
10/28/2025: In DC, Nvidia CEO Touts New AI Partnerships, Goes a Little MAGA, PCMag
Heading into Nvidia’s conference, I was worried that CEO Jensen Huang would go into the weeds about the finer points of GPU architecture. Instead, he used this nearly two-hour keynote to jump from topic to topic without getting into too much detail about any of them–and kept coming back to opportunities to praise President Trump.
10/29/2025: Passkey Adoption Sees Striking Progress, With One Obvious Leader, PCMag
I struggled to get this written at the end of a long workday, resulting in my getting some nuances wrong that required updating the post the next morning.
11/1/2025: Serial Dot-Com Purchaser Bending Spoons to Buy AOL, But Why?, PCMag
Writing about AOL in 2025 makes me feel so old, but as one of PCMag’s graybeards I had to cover the news of Bending Spoons buying the company that once ruled the online world. I got to this story a day after it broke, so I turned that lag into an opportunity to expand the piece with some quotes from a publicist for that Italian firm and from a podcast interview of its CEO Luca Ferrari last year
11/1/2025: After Elon Tantrum, SpaceX Now Prepping ‘Simplified’ Starship-Based Lunar Lander, PCMag
Since I wrote about Elon Musk’s childish reaction to NASA’s understandable concern over the pace of its Human Landing System work, I had to reach for a keyboard to cover SpaceX’s grown-up corporate response.
11/1/2025: ‘The Truth Is Paywalled.’ Internet Vets Lament the State of the ‘Open’ Web, PCMag
This Monday-evening panel was one of the first items on my calendar this week, but having event after event after event follow it led to me not writing it up until Thursday night. Once again, it was a serious treat to hear some of the Internet’s founding figures talk about the state of the thing they invented.
#AmericaOnline #AOL #ArtemisIII #ATT #BendingSpoons #BrewsterKahle #CindyCohn #Dashlane #FoundationForAmericanInnovation #HumanLandingSystem #JensenHuang #Nvidia #NvidiaGTCDC #passkeyExport #passkeys #phonePlans #smartphonePlans #SpaceX #TMobile #unlimitedData #verizon #VintCerf
-
📬 Wayback Machine: Archive.org feiert 1 Billion gespeicherte Webseiten
#Internet #Archiveorg #BrewsterKahle #Mementos #WaybackMachine #Webarchiv https://sc.tarnkappe.info/84bd03 -
#InternetArchive Is Now a #FederalDepositoryLibrary. What Does That Mean?
Founder #BrewsterKahle said that while the #nonprofit has always functioned as a #library, this new designation makes it easier to work with the other federal depository libraries. That, he said, is a service to everyone.
The Federal Depository Library Program was established by Congress in 1813, with the intention of ensuring that government records would be accessible to the American public.
https://www.kqed.org/news/12049420/sf-based-internet-archive-is-now-a-federal-depository-library-what-does-that-mean -
Internet Archive weiterhin offline nach Sicherheitsvorfall
Das Internet Archive, bekannt für seine Wayback Machine und als digitale Bibliothek für Websites, Software, Bücher und Filme, bleibt weiterhin offline. Seit dem 10. Oktober 2024 ist die Websei
https://www.apfeltalk.de/magazin/news/internet-archive-weiterhin-offline-nach-sicherheitsvorfall/
#News #Tellerrand #Angriff #BrewsterKahle #ContentRechte #DigitaleBibliothek #InternetArchive #Nutzerdaten #Sicherheitsvorfall #WaybackMachine -
We are losing vast swathes of our digital past, and copyright stops us saving it
It is hard to imagine the world without the Web. Collectively, we routinely access billions of Web pages without thinking about it. But we often take it for granted that the material we want to access will be there, both now and in the future. We all hit the dreaded “404 not found” error from time to time, but merely pass on to other pages. What we tend to ignore […]
#BrewsterKahle #governmentSites #InternetArchive #newsSites #pewResearch #publishers #report #webPages #wikipedia #worldWideWeb
https://walledculture.org/we-are-losing-vast-swathes-of-our-digital-past-and-copyright-stops-us-saving-it/
-
📬 Labels gewinnen juristische Vorrunde gegen Internet Archive
#Rechtssachen #BrewsterKahle #CapitolRecords #fairuse #Great78Project #InternetArchive #KahleAustinStiftung https://sc.tarnkappe.info/a216a4 -
📬 Great 78 Project: Internet Archive vs. Musiklabels
#Rechtssachen #BrewsterKahle #Great78Project #InternetArchive #Musiklabels #RIAA #Schallplatten #Schellackplatten #SonyMusic #UniversalMusic #Urheberrecht https://sc.tarnkappe.info/0dce02 -
📬 Internet Archive legt Berufung gegen US-Bundesrichter-Urteil ein
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ControlledDigitalLending #CorynneMcSherry #electronicfrontierfoundation #FairUseDoktrine #InternetArchive #JohnGKoeltl #TerrenceHart https://tarnkappe.info/artikel/e-books/internet-archive-legt-berufung-gegen-us-bundesrichter-urteil-ein-285264.html -
📬 Internet Archive legt Berufung gegen US-Bundesrichter-Urteil ein
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ControlledDigitalLending #CorynneMcSherry #electronicfrontierfoundation #FairUseDoktrine #InternetArchive #JohnGKoeltl #TerrenceHart https://tarnkappe.info/artikel/e-books/internet-archive-legt-berufung-gegen-us-bundesrichter-urteil-ein-285264.html -
📬 Internet Archive legt Berufung gegen US-Bundesrichter-Urteil ein
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ControlledDigitalLending #CorynneMcSherry #electronicfrontierfoundation #FairUseDoktrine #InternetArchive #JohnGKoeltl #TerrenceHart https://tarnkappe.info/artikel/e-books/internet-archive-legt-berufung-gegen-us-bundesrichter-urteil-ein-285264.html -
📬 Internet Archive legt Berufung gegen US-Bundesrichter-Urteil ein
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ControlledDigitalLending #CorynneMcSherry #electronicfrontierfoundation #FairUseDoktrine #InternetArchive #JohnGKoeltl #TerrenceHart https://tarnkappe.info/artikel/e-books/internet-archive-legt-berufung-gegen-us-bundesrichter-urteil-ein-285264.html -
📬 Internet Archive legt Berufung gegen US-Bundesrichter-Urteil ein
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ControlledDigitalLending #CorynneMcSherry #electronicfrontierfoundation #FairUseDoktrine #InternetArchive #JohnGKoeltl #TerrenceHart https://tarnkappe.info/artikel/e-books/internet-archive-legt-berufung-gegen-us-bundesrichter-urteil-ein-285264.html -
Internet Archive: new copyright laws for generative AI would “further entrench” market leaders
The current excitement over artificial intelligence (AI), particularly generative AI, has now reached the stage where governments feel they need to do something about it in terms of regulations. The EU’s AI Act was drawn up before generative AI took off, but is now being retro-fitted with bad ideas to take account of recent developments. Meanwhile, across the …
#AccessToCulture #AccessToKnowledge #ai #BrewsterKahle #competition #generativeAi #InternetArchive #licensing #privacy #study #usCopyrightOffice
https://walledculture.org/internet-archive-new-copyright-laws-for-generative-ai-would-further-entrench-market-leaders/
-
Innovation comes when there is a tender balance between chaos and bureaucracy, and not too much of either.
-- Brewster Kahle -
📬 Internet Archive: digitale Buchausleihe verstößt gegen Urheberrecht
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ChrisFreeland #ControlledDigitalLending #InternetArchive #JohnGKoeltl #MariaPallante https://tarnkappe.info/artikel/rechtssachen/internet-archive-digitale-buchausleihe-verstoesst-gegen-urheberrecht-272139.html -
📬 Internet Archive: Termin für mündliche Anhörung steht
#EBooks #Rechtssachen #BrewsterKahle #ControlledDigitalLending #FairUseDoktrin #FightfortheFuture #HachetteBookGroup #HarperCollinsPublishers #InternetArchive #JohnWileySonsInc #PenguinRandomHouse #RichterJohnGKoeltl https://tarnkappe.info/artikel/rechtssachen/internet-archive-termin-fuer-muendliche-anhoerung-steht-266015.html -
"The Internet Archive Documentary"
https://tilvids.com/w/bTc6GagBPsWYT8vEq2adPE
Archive is a documentary focused on the future of long-term digital storage, the history of the Internet and attempts to preserve its contents on a massive scale. Directed by Jonathan Minard
#InternetArchive #Documentaries #History #ComputerHistory #Libraries #WaybackMachine #BrewsterKahle
-
@BillySmith & btw I was very honored and fortunate to meet and talk to hypertext pioneer Ted Nelson, a few years ago at an event at #InternetArchive headquarters, San Francisco. IA and its founder/director @brewsterkahle are—like Nelson—major inspirations to me; IA a core & constant asset in my life and projects. #BrewsterKahle #InternetArchive c/ @internetarchive #TedNelson #THNelson
-
📬 Internet Archive vs. Buchverlage: summarisches Urteil angestrebt
#EBooks #Rechtssachen #AssociationofAmericanPublishers #BrewsterKahle #ControlledDigitalLending #InternetArchive #NationalEmergencyLibrary https://tarnkappe.info/artikel/rechtssachen/internet-archive-vs-buchverlage-summarisches-urteil-angestrebt-244328.html -
#ActuLibre Un commun numérique fête ses 25 ans : Internet Archive à lire sur https://framablog.org/2021/10/24/un-commun-numerique-fete-ses-25-ans-internet-archive/ #Internetetsociété #Communnumérique #InternetArchive #LibresCultures #LibresServices #Waybackmachine #BrewsterKahle #Anniversaire #Traduction
-
📬Internet Archive: Anfrage nach Verlagsdaten für Verteidigung abgelehnt📬 https://tarnkappe.info/internet-archive-anfrage-nach-verlagsdaten-fuer-verteidigung-abgelehnt/ #NationalEmergencyLibrary #PenguinRandomHouse #InternetArchive #BrewsterKahle #HarperCollins #Rechtssachen #Hachette #E-Books #Wiley
-
📬Internet Archive: EFF und Durie Tangri übernehmen Verteidigung gegen Verlagsklage📬 https://tarnkappe.info/internet-archive-eff-und-durie-tangri-uebernehmen-verteidigung-gegen-verlagsklage/ #ElectronicFrontierFoundation(EFF) #ControlledDigitalLending #NationalEmergencyLibrary #CorynneMcSherry #InternetArchive #BrewsterKahle #DurieTangri #JoeGratz #Artikel
-
📬Wegen Urheberrechtsverletzung: Verlage reichen gegen Internet Archive Klage ein📬 https://tarnkappe.info/wegen-urheberrechtsverletzung-verlage-reichen-gegen-internet-archive-klage-ein/ #NationalEmergencyLibrary #PenguinRandomHouse #HachetteBookGroup #InternetArchive #MariaA.Pallante #DouglasPreston #JohnWiley&Sons #BrewsterKahle #HarperCollins #Artikel
-
📬National Emergency Library: Notstands-Bibliothek bietet 1,4 Mio. Gratis-Bücher an📬 https://tarnkappe.info/national-emergency-library-notstands-bibliothek-bietet-14-mio-gratis-buecher-an/ #NationalEmergencyLibrary #InternetArchive #TheAuthorsGuild #BrewsterKahle #ChrisFreeland #KyleCourtney #ThomasRid #KimKavin #Artikel #fairuse
-
#omisego #maidsafe #timbernerslee #brewsterkahle
talk about the prospects for a decentralised web in this blog