© 2025 Marfa Public Radio
A 501(c)3 non-profit organization.

Lobby Hours: Monday - Friday 10 AM to Noon & 1 PM to 4 PM
For general inquiries: (432) 729-4578
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

As the Trump administration purges web pages, this group is rushing to save them

The Internet Archive office is housed in a former Christian Science church in San Francisco. Six weeks into the administration, the Internet Archive said it had cataloged some 73,000 web pages that existed on the U.S. government websites prior to Trump's inauguration and have since been expunged.
Carolyn Fong for NPR
The Internet Archive office is housed in a former Christian Science church in San Francisco. Six weeks into the administration, the Internet Archive said it had cataloged some 73,000 web pages that existed on the U.S. government websites prior to Trump's inauguration and have since been expunged.

SAN FRANCISCO — If you've ever clicked on a hyperlink that's taken you to something called the Wayback Machine to view an old web page, you've been introduced to the Internet Archive.

The nonprofit, founded in 1996, is a digital library of internet sites and cultural artifacts. This includes hundreds of billions of copies of government websites, news articles and data. The Wayback Machine is the archive's access point to nearly three decades of web history. But many of the million or so daily visitors that flock to the Internet Archive's online address might not know anything about its physical one: an old Christian Science church in the Bay Area.

The headquarters of the Internet Archive, an impressive white-columned, Greek revival-style temple, rises just south of the Golden Gate bridge.

Near the entrance of the building's nave, a triptych of towering black computer servers are humming loudly.

"That is the Internet Archive," said Mark Graham, the director of the Internet Archive's Wayback Machine, pointing to the server stacks. Graham was leading about a dozen visitors on a weekly public tour of the headquarters on a recent Friday in March. He projected his voice to be heard over the drone of the computers. "Those machines are servers that are being used right now to record and save material. The lights are blinking — that means that something is being written to read from those hard drives."

Mark Graham stands in front of servers at the Internet Archive.
Carolyn Fong for NPR /
Mark Graham stands in front of servers at the Internet Archive.

The servers are live-recording the World Wide Web. The results are staggering. Every day, about 100 terabytes of material are uploaded to the Internet Archive, or about a billion URLs, with the assistance of automated crawlers. Most of that ends up in the Wayback Machine, while the rest is digitized analog media — books, television, radio, academic papers — scanned and stored on servers.

As one of the few large-scale archivists to back up the web, the Internet Archive finds itself in a particularly unique position right now. After President Trump's inauguration in January, some federal web pages vanished. While some pages were removed entirely, many came back online with changes that the new administration's officials said were made to conform to Trump's executive orders to remove "diversity, equity, inclusion, and accessibility policies." Thousands of datasets were wiped — mostly at agencies focused on science and the environment — in the days following Trump's return to the White House.

Information about climate change, reproductive health, gender identity and sexual orientation also have been on the chopping block. For example, pages referencing the Enola Gay — the B-29 aircraft that dropped an atomic bomb on Hiroshima and is not particularly related to LGBTQ history — were among a leaked list of posts the Pentagon flagged for removal. Some deleted pages, including those related to the Enola Gay, have resurfaced as agencies figure out how to comply with Trump's directives.

The Internet Archive is among the few efforts that exist to catch the stuff that falls through the digital cracks, while also making that information accessible to the public. Six weeks into the new administration, Wayback Machine director Graham said, the Internet Archive had cataloged some 73,000 web pages that had existed on U.S. government websites that were expunged after Trump's inauguration. 

Graham noted that, for example, the Internet Archive is currently the only place the public can find a copy of an interactive timeline detailing the events of Jan. 6. The timeline is a product of the congressional committee that investigated the Capitol attack, and has since been taken down from their website. Graham said it's in the public's interest to save such records.

"How much money did our tax dollars pay to make it?" he said, referring to the timeline and committee proceedings. "It was a non-trivial exercise and it's part of our history — and for that reason alone, worthy of preservation and worthy of exploration, of understanding."

It's typical for new presidential administrations to make changes to federal websites. In 2008, the Internet Archive co-created a tool called the End of Term Web Archive to track and back up such changes. But Graham said that under Trump's second term, the scope and sheer pace of the deletions of government data has been unprecedented.

"A lot of folks are out there trying to say, 'What the heck just happened?'" Graham said. "We're just doing our job, trying to be the best library that we can be, trying to help preserve the cultural heritage of our time — to make this material accessible, useful to people now and into the future."

Since Trump's second inauguration, more people are turning to the nonprofit

According to Graham, based on the big jump in page views he's observed over the past two months, the Internet Archive is drawing many more visitors than usual to its services — journalists, researchers and other inquiring minds. Some want to consult the archive for information lost or changed in the purge, while others aim to contribute to the archival process. 

"There's a groundswell of support for the Internet Archive because of the dramatic shift that's going on in parts of the government web infrastructure that you wouldn't imagine would change," said Brewster Kahle, the founder and current director of the Internet Archive. "People are coming and rallying behind us — by using it, by pointing at things, helping organize things, by submitting content to be archived — data sets that are under threat or have been taken down."

Internet Archive founder Brewster Kahle speaks onstage during Unfinished Live at The Shed in New York City in 2022.
Roy Rochlin / Getty Images for Unfinished Live
/
Getty Images for Unfinished Live
Internet Archive founder Brewster Kahle speaks onstage during Unfinished Live at The Shed in New York City in 2022.

Nancy Krieger, a social epidemiologist at Harvard University who likened the purge to "a digital book burning" in a February interview with NPR's Ailsa Chang, is one of them. She's teamed up with other scientists to try to preserve federal health data that has recently disappeared from government websites. She helped develop a list of terms to send to the Internet Archive to aid the search and preservation effort.

"We want to preserve public health data that are crucial for people's well-being," she told NPR.

For example, she noted, there's a web page on the Centers for Disease Control and Prevention's site titled "Ending Gender-Based Violence." It highlights CDC research showing that adolescent girls and young women bear a disproportionate burden of HIV cases worldwide, an issue driven by gender-based violence and poor access to health services. The page, which was accessible on Jan. 16 prior to Trump's inauguration, now reads "page not found."

Graham's team has been working to get ahead of future purges, trying to identify and capture the material that might be at greater risk of removal, he said.

"Certainly this administration in some ways has made our job easier," he said. "Even on the first day, they began sharing terms, words, topic areas that were going to be under examination — terms like 'DEI.' "

The Internet Archive doesn't catch everything. A report about the risks of bird flu to people and pets briefly appeared and disappeared on the Centers for Disease Control and Prevention website. Graham said it appeared that the Wayback Machine wasn't able to record it in time.

"I remember, I immediately went in and I kind of held my breath like, 'Oh, do we have that?' And we didn't have it," he said.

There's a chance it could pop up later, possibly through the stream of material coming from outside contributors and partners. Most of what the Internet Archive slurps into the Wayback Machine becomes available to the public with minimal delay. In some cases, because the organization works with different partners in the archival process, there is a delay between when the material is collected by those partners and when it's made available through the Wayback Machine.

"I'm still keeping my fingers crossed on that one," Graham said. When the Internet Archive's scrapers fail to capture such data, he said "it's an opportunity for us to learn how we can do our jobs better."

As the organization works to adapt, Graham said the job has him working overtime. "On a personal level, this has been a bit of a sprint," he said. "I've been working seven days a week for the last many weeks. I've been finding myself, quite literally since the inauguration, waking up earlier with a sense of purpose and energy."

Keeping the public front of mind

Despite its pioneering role in the digital realm, the Internet Archive team wants to keep people, not just machines, in full focus. Near the servers, clay sculptures — petite doppelgängers immortalizing people who have worked for the organization — line the walls and spill into the pews.

Mark Graham points at a ceramic sculpture of his likeness at the Internet Archive.
Carolyn Fong for NPR /
Mark Graham points at a ceramic sculpture of his likeness at the Internet Archive.

"We have all those little statues, which I think is a way of celebrating the people working on these collections," Kahle said. "People have agency to build the technologies we think will serve us well. It's [important] to have people understand how they can participate, that it's not something happening to them. It's ours."

Avinash Krishna, a 22-year-old recent college graduate, visited from the Sacramento area to tour the headquarters. He said he's been using the Internet Archive's services for about a decade. The tour had long been on his to-do list, but a recent visit to a Wikipedia page bumped it up higher. To him, it was an example of how he's seen the web become increasingly reliant on the archive's tools.

"I don't remember the page but, you know, a significant percentage of the links that were on the Wikipedia article are Internet Archive links," he said. "That is really sad — that what people view as a primary source is something that doesn't exist anymore."

Mark Graham leads a free tour of the Internet Archive office.
Carolyn Fong for NPR /
Mark Graham leads a free tour of the Internet Archive office.

Krishna is grieving what's known as digital decay or "link rot" — the massive, expanding graveyard of broken links across the web. It's what you see when you encounter "Error 404" or "page not found."

While the Trump administration's scrubbing of federal web pages presents a notable example of the severed links issue, it's long been an epidemic. A Pew Research Center study published last year found that roughly 38% of web pages on the internet that existed in 2013 were no longer accessible as of 2023. According to a Harvard Law Review study published in 2014, about half of all links cited in U.S. Supreme Court opinions no longer led to the original source material.

Kahle, who early on recognized the ephemeral nature of the web, said the rapid deterioration of the living web is a serious threat to historical preservation. "We're building our culture on shifting sands," he said.

An employee at Internet Archive office digitizes a book.
Carolyn Fong for NPR /
An employee at Internet Archive office digitizes a book.

A behemoth of link rot repair, the Internet Archive rescues a daily average of 10,000 dead links that appear on Wikipedia pages. In total, it's fixed more than 23 million rotten links on Wikipedia alone, according to the organization.

The rapid decimation of government site data is just the latest challenge facing the nonprofit. Since 2020, the Internet Archive has been slapped with costly copyright lawsuits over its digitization of books and music that are not in the public domain. Record labels and book publishers have sued the nonprofit for hundreds of millions of dollars.

Founder Kahle said the costly lawsuits — which legal experts say are meant to be a deterrent — threaten the future of the archive. With a staff of some 120 people, the organization had a budget of about $28 million last year — less than a fifth of the San Francisco Public Library's budget. It's funded through donations big and small, as well as money that comes from museums, libraries and other institutions that pay the nonprofit to preserve its collections. On top of that, the organization has also been a target in a recent series of cyberattacks on libraries.

Even at a time when the Internet Archive is under threat, its founder Kahle appreciated that, back at the headquarters, the big room of towering servers — the lifeblood of the library — remains unobstructed, in full public view.

"It's like open stacks," he said. "It's not hidden away in some bunker someplace. It's 'this is us.' It comes across as a bit vulnerable, right?"

Kahle said he thinks this vulnerability sends a message: "We have to support our institutions or they will go away."

Copyright 2025 NPR

Members of the tour look at the Internet Archive servers that are on display and actively working.
Carolyn Fong for NPR /
Members of the tour look at the Internet Archive servers that are on display and actively working.

Emma Bowman
[Copyright 2024 NPR]