Tag Archives: democracys library

Meet Sophia Tung, the Creative Force Behind Internet Archive’s Microfiche Scanning Livestream

Setting up a livestream is more complicated than just turning on a camera. That’s why the Internet Archive tapped into the expertise of Sophia Tung, a software engineer and online content creator, to help create the livestream for its microfiche scanning center, which launched May 21.

The 29-year-old garnered international media coverage for her livestream of robotaxis parked in a depot just below her San Francisco apartment as they jostled and honked – sometimes in the middle of the night.

“I put it up just sort of as a meme to get some attention. If I couldn’t do anything about it, then I might as well make the best of it,” Tung said of the livestream she posted on YouTube with Lo-fi music in the background. “People became fans of it and Brewster [Kahle, Internet Archive’s digital librarian] reached out to see if I could do something similar with the Internet Archive.”

An avid user of the Internet Archive for years, Tung said she was eager to visit its Funston Avenue headquarters and work with the staff on the project. As a sign of our tech-connected times, it’s become popular to have a mesmerizing scene with mellow music playing on a second monitor as people work. Tung said she could envision a relaxing, but informative, feed showing the preservation process.

Sophia Tung

Tung met with the team who take microfiche – flat sheets of film that hold miniaturized documents – and turn them into digital images that can be accessed online. The team is now digitizing U.S. Supreme Court case documents and government records from Canada dating back to the 1930s.

After assessing the space with five active microfiche digitization stations,Tung decided on a three-camera setup for the livestream. One is focused on an operator feeding microfiche cards under a high-resolution camera that captures multiple detailed images. Another is an up-close look of what actually happens on the machine. A third wide-angle camera covers the entire room and is blurred for security, but still conveys motion.  

All team members are open to being on camera as they work, but Tung said she recognized privacy concerns may arise. She devised a pause button to be installed to stop the feed, momentarily dimming the “on air” sign in the room. Although initially concerned that employees might not like being on camera, Tung said staff were hired who agreed to the concept and they are on board with the livestream as a mixed media project.

Live activity with the scanners occurs Monday–Friday, 7:30am-3:30pm U.S. Pacific Time (GMT+8)—except U.S. holidays. Ambient Lo-fi music plays continuously. After hours, other Internet Archive content runs on the video feed including silent films, lost landscape footage from everyday life, and public domain photographs from NASA and other sources.

The project has required a combination of engineering to make the infrastructure work 24/7, plus physical design integrating signage and broadcasting lights, which Tung says she enjoyed. Her goal was two-fold: to recreate the excitement of her last livestream and to shine a light on the individuals working behind the scenes at the Archive.

“I always thought about the Internet Archive as just some mysterious entity, trying to preserve what we as individuals cannot. It’s an invaluable tool for journalists and, basically, everybody,” Tung said. “Now, preservation is more important than ever. I think people just assume that it happens. Actually, it takes money, effort, machinery and people. I think it’s important to highlight all the people-hours that go into it.”

Tung produced an explainer video about the microfiche livestream project on YouTube. “The reception has been great so far,” said Tung, who is working on more features and possible additional channels to add to the stream. “I hope the stream brings awareness to the effort it takes to preserve all this important material. If we don’t preserve it now, we are going to lose it.”

All microfiche materials are added to Democracy’s Library, the global project to collect, digitize, and provide free public access to the world’s government publications.

More details on the livestream project can be found here: https://blog.archive.org/2025/05/21/new-livestream-brings-microfiche-digitization-to-life-for-democracys-library/

New Livestream Brings Microfiche Digitization to Life for Democracy’s Library

Ever wonder how government documents, once locked away on tiny sheets of microfiche, become searchable and accessible online? Now you can see it happen in real time.

Today, the Internet Archive has launched a livestream from our microfiche scanning center (https://www.youtube.com/live/aPg2V5RVh7U), offering a behind-the-scenes look at the meticulous work powering Democracy’s Library—a global initiative to make government publications freely available to the public.

“This livestream shines a light on the unsung work of preserving the public record, and the critical infrastructure that makes democracy searchable,” said Brewster Kahle, founder of the Internet Archive. “Transparency can’t be passive—it must be built, maintained, and seen. That’s what this livestream is all about.”

Watch the livestream now:

What You’ll See

The livestream features five active microfiche digitization stations, with a close-up view of one in action. Operators feed microfiche cards beneath a high-resolution camera, which captures multiple detailed images of each sheet. Software stitches these images together, after which other team members use automated tools to identify and crop up to 100 individual pages per card.

Each page is then processed, made fully text-searchable, and added to the Internet Archive’s public collections—completed with metadata—so that researchers, journalists, and the general public can explore and download them freely through Democracy’s Library.

📅 Live activity occurs Monday–Friday, 7:30am-3:30pm U.S. Pacific Time (GMT+8)—except U.S. holidays—with a second shift coming soon.


What Is Microfiche?

Microfiche is a flat sheet of film that holds dozens—sometimes hundreds—of miniaturized document images. It’s been a common format for archiving newspapers, court documents, government records, and more since the 20th century.

Why Is Microfiche Digitization Important?

“Materials on microfiche are an important part of our country’s history, but right now they are often only available online from expensive databases. We are excited that this project will digitize court documents from our collection and make them freely available to everyone,” said Leslie Street, Director of the Wolf Law Library of William and Mary College.

“Thousands of documents and reports from across the federal government were distributed in microfiche to Federal Depository Library Program (FDLP) libraries around the country from 1970 – 2022. While important for space-saving and preservation, microfiche has long been problematic for public access. So this digitization work of Democracy’s Library is incredibly important and will unlock free access to this essential historic public domain corpus to readers and researchers around the world!” noted James R. Jacobs, US government information librarian and co-author of the recently published book, Preserving Government Information: Past, Present, and Future.

To learn more about the importance of microformats like microfiche and microfilm, read Brewster Kahle’s essay, “Microfilm: The Rise, Fall, and New Life of Microfilm Collections.

About Democracy’s Library

Democracy’s Library is the Internet Archive’s ambitious project to collect, digitize, and provide free public access to the world’s government publications. From environmental impact reports to court decisions, these materials are essential for accountability, scholarship, and civic engagement.

The microfiche collections that will be digitized in this process include US GPO documents, Canadian government documents, US court documents, and UN publications. We are always looking for more collections to be donated.

Meet the People Behind the Work

From left: Internet Archive’s digital librarian, Brewster Kahle, with microfiche scanning operators Dylan, Louis, Elijah, Avery, and Fernando.

This digitization livestream was brought to life by Sophia Tung, appmaker & designer behind the viral robotaxi depot livestream on YouTube.

The digitization is overseen by scanning operators who are trained to handle physical library materials and digitization equipment.

Thanks also to Internet Archive staff who assisted this project, including CR Saikley, Merlijn Wajer, Brewster Kahle, Derek Fukumori, Jude Coelho, Anastasiya Smith, Jonathan Bloom, Bas Kloosterman, Andrea Mills, Richard Greydanus, Louis Brizuela, Carla Igot Bordador, and Ria Gargoles.

Thanks to Our Partners

Thank you to Wolf Law Library at the William & Mary Law School, University of Alberta, and Free Law Project for donating microfiche and helping advise this project.

If your library has microfiche or other materials to donate to the Internet Archive, please learn more about donating materials for preservation and digitization.

Support the Work

Preserving and digitizing these fragile, analog records is resource-intensive—and deeply worthwhile. Donate today to support the Internet Archive and Democracy’s Library.

Enjoy the livestream! Thank you for helping us preserve history and protect access to knowledge.

End of Term Web Archive – Preserving the Transition of a Nation

It’s that time again. The 2024 End of Term crawl has officially begun! The End of Term Web Archive #EOTArchive hosts an initiative named the End of Term crawl to archive U.S. government websites in the .gov and .mil web domains — as well as those harder-to-find government websites hosted on .org, .edu, and other top level domains (TLDs) — as one administrative term ends and a new term begins. 

End of Term crawls have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020. The results of these efforts is preserved in the End of Term Web Archive. In total, over 500 terabytes of government websites and data have been archived through the End of Term Web Archive efforts. These archives can be searched full-text via the Internet Archive’s collections search and also downloaded as bulk data for machine-assisted analysis.

The purpose of the End of Term Web Archive is to preserve a record of government websites for historical and research purposes. It is important to capture these websites because they can provide a snapshot of government messaging before and after the transition of terms. The End of Term Web Archive preserves information that may no longer be available on the live web for open access.

The End of Term Archive is a collaborative effort by the Internet Archive along with the University of North Texas (UNT), Stanford University, Library of Congress (LC), U.S. Government Publishing Office (GPO), and National Archives and Records Administration (NARA). Past partners include the University of CA’s California Digital Library (CDL), George Washington University, and the Environmental Data and Governance Initiative (EDGI).

Four images of Whitehouse.gov captured between 2008 and 2020
Whitehouse.gov captures from: 2008 Sept. 15; 2013 Mar. 21; 2017 Feb. 3; and 2021 Feb. 25

We are committed to preserving a record of U.S. government websites. But we need your help to complete the 2024 End of Term crawl. 

How can you help?! 

We have a list of top level domains from the General Services Administration (GSA) and from previous End of term crawls. But we need volunteers to help us out. We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive.

Submit a url nomination by going to digital2.library.unt.edu/nomination/eth2024/.
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible. 

Individuals and institutions nominating seed urls are recognized on the individual contributors leaderboard and the institutions leaderboard!

Explore the End of Term Web Archive with full text search and download the data!

The International Democracy’s Library Team Came Together for Presentations, Discussion, and a Workshop About Gov Docs (3.16.23)

Let’s Build It Together!

Video: https://archive.org/details/full-democracys-library-3.16.23-presentation

On March 16, 2023, the Internet Archive hosted the “Democracy’s Library Workshop: Community Collaboration.” This event marked the first public presentation and discussion of the Democracy’s Library Project since its inauguration at the 2022 Annual Event, following several months of research, supported by the Filecoin Foundation, from November 2022 to February 2023. The presentation, a collaboration between Internet Archive staff and a visiting government official, aims to preserve government information and make it much more meaningfully accessible to the public. The event was live-streamed and can be viewed at the provided video link.

Presentation includes:

  • Brewster Kahhale, founder of The Internet Archive, providing an introduction and discussing why we need to “Build Our Collections Together.”
  • Andrea Mills, Executive Director of Internet Archive Canada, discussing the incredible progress made in Canada working with their foundational partner, the University of Toronto, in digitizing government information. 
  • Jamie Joyce,  leading the Democracy’s Library initiative at Internet Archive in the U.S., reporting on the U.S. landscape analysis and stakeholder interviews.

To librarians and archivists: please know we are still collecting feedback from government information professionals. So if you are a librarian or archivist, we would love to hear from your experience. If you’re interested in sharing, please fill out this survey.

See existing Democracy’s Library here: https://archive.org/details/democracys-library 

Also, In Case You Missed Them…Recommendations and Strategic Plans from the GPO: 

Declaring Democracy’s Library (U.S.)

A video presentation of findings, an executive summary, and more to come from the United States team.

Video: https://archive.org/details/jamie-joyces-democracys-library-presentation

After the declaration of Democracy’s Library at the 2022 Internet Archive Annual Event [video], the U.S. team underwent a 4-month landscape analysis to discover the state of the United States’ collective knowledge management. 

Over the course of this blog series we’ll discuss our findings, including the various ways in which our federated national infrastructure contributes to the immense complexity which inhibits easy and meaningful access to the public’s information. 

But for now, we would like to share our executive summary. This piece is informed from interviews with librarians, archivists, information professionals, after review of various pieces of legislation, government agency reports, as well as consultation with government representatives at various departments, technologists working on civic-tech and gov-tech applications, and users of government information.

A huge thanks again to all who were interviewed, involved, and are excited about this program.

EXECUTIVE SUMMARY OF THE DEMOCRACY LIBRARY (U.S.) REPORT

    Every year, the United States government spends billions of dollars generating data: including reports, research, records, and statistics. Both governments and corporations know that this data is a highly valuable strategic asset. Yet meaningful access to this critical data is effectively kept out of the public’s hands. Though much of it is intended to be publicly accessible, we do not have a publicly-accessible central repository where we can search for all government artifacts. We do not have a public library of all government data, documents, research, records, and publications. These artifacts are not easy for everyone to get a hold of.

    Instead, this data is organized only to be kept behind paywalls, vended to multinational corporations, guarded by “data cartels,” or sits inaccessibly among thousands of disjointed agency websites, with non-standardized archival systems that are stewarded by under-resourced librarians and archivists. This data is siloed within agencies, never before linked together. Although by law, we are entitled to this data – by default, journalists, activists, democracy technologists, academics, and the public are deprived of meaningful access. Instead, it’s a pay to play system in which many are priced out.

    However, if we could reduce the public burden in accessing this knowledge – as the federal government has stated is a priority – then it might be the lynchpin to transforming democratic systems and making them more efficient, actionable, and auditable in the future. This work could potentiate a big data renaissance in political science and public administration. It could equip every local journalist with comprehensive, ‘investigative access’ to policy-making across the country. It could even provide key insights which ensure that democracy survives, thrives, adapts, and evolves in the 21st century; like so many desperately want it to and yet so many fear that it may never. To make our democracy more resilient and prepared for the digital age, we need Democracy’s Library. 

Democracy’s Library is a 10 year, multi-pronged, partnership effort to collect, preserve, and link our democracy’s data in a centralized, queryable repository. This repository of data will be sourced from all levels of the U.S. government, for the purpose of informing innovation, enabling transparency, advancing new fields like mass political informatics, and overall, digitizing our democracy. Access to this data is a necessary substrate for that innovation, and to propel our antiquated system into a lightning fast future, we need to overcome challenges from the artifact-level to the systems-level. 

    Fortunately, the Internet Archive is perfectly primed to comprehensively take on these challenges alongside our partners (like the Filecoin Foundation) through this new initiative, supported by a groundswell of legislative and political support. The time is right, the network is primed, and most of the tools are already built and being deployed. So, the only thing that remains is for funding partners to step up to scale the effort to revolutionize the U.S. government once again.

To librarians and archivists: please know we are still collecting feedback from government information professionals. So if you are a librarian or archivist, we would love to hear from your experience. If you’re interested in sharing, please fill out this survey.

See existing Democracy’s Library here: https://archive.org/details/democracys-library

Community Turns Out to Celebrate Promise of Democracy’s Library

Friends and supporters of the Internet Archive gathered October 19 at the organization’s headquarters in San Francisco to celebrate the launch of Democracy’s Library.

Plans to collect government documents from around the world and make them easily accessible online were met with enthusiasm and endorsements. Speakers at the event expressed an urgency to preserve the public record, make valuable research discoverable, and keep the citizenry informed—all potential benefits of Democracy’s Library. 

“If we really succeed — and we have to succeed — then Democracy’s Library might become an inspiration for openness in areas that are becoming more and more closed,” said Internet Archive founder Brewster Kahle. 

The 10-year project aims to make freely available the massive volume of government publications (from the U.S. and other democracies), including books, guides, reports, surveys, laws and academic research results, which are all funded with taxpayer money, but often difficult to find. 

To kick off the project, Kahle announced the Internet Archive’s initial contributions to Democracy’s Library:

  • United States .gov websites collected since 2008; 
  • Crawls of the U.S. state government websites;
  • Digitized microfilm and microfiche from the U.S. Government Publishing Office, NASA and other government entities;
  • Crawls of government domains from 200 other countries;
  • 50 million government PDF documents made into text searchable information.

It will be a collaborative effort, said Kahle, calling upon others to join in the ambitious undertaking to contribute to the online collection.

The need for Democracy’s Library

“We need Democracy’s Library. The Internet Archive’s work leading this project represents a critical step in the evolution of democracy,” said Jamie Joyce, executive director of The Society Library and emcee of the program. “Archives and libraries, as they’ve always done in the past, will continue to change in their scope, scale, and capabilities to be of critical use to society, especially democratic societies. Tonight is about witnessing another transformation.”

Although there is more data available than ever before, Joyce said, society’s knowledge management system is badly broken. Misinformation is rampant, while high quality government data is buried and scattered across different federal, state and local agencies. 

Having public material consolidated, digitized and machine readable will allow journalists, activists, and others to be better informed. It will also make democracy more transparent and accountable, as well as protect the historical documents. “We will not be able to compute in the future what we do not save today,” Joyce said.

At a time when polarized politics can put information at risk, the event highlighted the need to safeguard public data.

Gretchen Gehrke, co-founder of the Environmental Data and Governance Initiative, has been working in partnership with the Internet Archive to track changes in federal environmental websites. 

“People should be able to know about environmental issues and have a say in environmental decisions,” she said. “For the last 20 years, the majority of this information has been delivered through the web, but the right to access that information through the web is not protected.”

Gehrke described how public resources and tools related to the federal Clean Power Plan, a hallmark environmental regulation of the Obama administration, were taken down from the Environmental Protection Agency’s website under President Trump’s tenure. 

“There are no policies protecting federal website information from suppression or outright censorship,” Gehrke said. “This case serves as an example of why we need Democracy’s Library to preserve and provide continued access to these critical government documents.”

When statistics are being cited in policy debates, citizens need to be able to have access to sources of claims. For example, Sharon Hammond, chief operating officer of The Society Library, said documents related to the environmental impact of California’s Diablo Canyon power plant should be easily available. There are nearly 5 different government bodies that have some role in monitoring the plant’s ecological impact, but the agencies house the reports on their own websites. 

“Finding governmental records about public policy matters should not be a barrier to becoming an informed participant in these collective decisions,” Hammond said. “When we connect evidence directly to the claims and make that information publicly accessible as a resource, we can improve the public discourse.”

Hammond said a searchable, machine readable repository of government documents, with active links and a register of relevant government agencies, will dramatically increase meaningful access to the public’s information.

An international vision

The effort is an international one, and Canada has stepped forward as an early partner.

Canada has contributed crawls by the Library and Archives Canada of all the country’s government websites, as well as digitized microfilm and books from the Canadian Research Knowledge Network, Canadiana, and the University of Toronto.

Leslie Weir, librarian and archivist of Canada, spoke in support of the initiative. 

“We know by making our collection and work of government openly accessible, we will create a more engaged community, a community that participates in elections, school board meetings, in public consultations, and yes, even and especially in protests,” Weir said. “Access is the key to understanding. And understanding is the underpinning of democracy.”

Celebrating heroes

The festivities concluded with a tribute to Carl Malamud, recipient of the 2022 Internet Archive Hero Award. Corynne McSherry, legal director of the Electronic Frontier Foundation, presented the award. “Carl has always seen what the internet could be. He has dedicated his life to building that internet,” she said. “He is a true hero.”

Malamud said government information is more than just a good idea. “It is about the law. It is about our rulebook. It is the manual on how we, as citizens, choose to run our society. We own this manual,” he said. “We cannot honor our obligations to future generations if we cannot freely read and speak and even change that rulebook.”

Malamud urged the audience to get involved to realize the vision of Democracy’s Library and guarantee universal access to human knowledge. 

“This is our moment. We must build a distributed and interoperable internet for our global village. We must make the increase in diffusion of knowledge our mutual and everlasting mission,” Malamud said. “We must seize the means of computation and share their fruits with all the people. Let us all swim together in the ocean of knowledge.”

For more on Malamud’s career and contributions, read his profile here.

Introducing Democracy’s Library

Democracies need an educated citizenry to thrive. In the 21st century, that means easy access to reliable information online for all. 

To meet that need, the Internet Archive is building Democracy’s Library—a free, open, online compendium of government research and publications from around the world.

“Governments have created an abundance of information and put it in the public domain, but it turns out the public can’t easily access it,” said Internet Archive founder Brewster Kahle, who is spearheading the effort to collect materials for the digital library. 

By having a wealth of public documents curated and searchable through a single interface, citizens will be able to leverage useful research, learn about the workings of their government, hold officials accountable, and be more informed voters. 

Too often, the best information on the internet is locked behind paywalls, said Kahle, who has helped create the world’s largest digital library.

“It’s time to turn that scarcity model upside down and build an internet based on abundance,” Kahle said. There is a need for equitable access to objective, historical information to balance the onslaught of misinformation online.  

Libraries have long played a vital role in collecting and preserving materials that can educate the public. This mission continues, but the collections need to include digital items to meet the needs of patrons of the internet generation today.

Over the next decade, the Internet Archive is committing to work with libraries, universities, and agencies everywhere to bring the government’s historical information online. It is inviting citizens, libraries, colleges, companies, and the Wikipedians of the world to unlock good information and weave it back into the Internet.

Democracy’s Library will be celebrated at the October 19 event, Building Democracy’s Library, in San Francisco and online. 

Watch the livestream of Building Democracy’s Library:

The project is part of Kahle’s vision to build a better Internet—one that keeps the public interest above private profit. It is based on an abundance model, in which data can be uncovered, unlocked and reused in new and different ways. 

“We know there’s an information flood, but it’s not necessarily all that good,” Kahle said. “It turns out the information on the Internet is not very deep. If you know a subject well, you find that the best information is buried or not even online.”

Democracy’s Library is a move to make governments’ massive investment in research and publications open to all. 

Kahle added: “Democracy’s Library is a stepping stone toward citizens who are more empowered and more engaged.“

The first steps of Democracy’s Library are available online at https://archive.org/details/democracys-library.

An Update from Hugh Halpern, Director of the U.S. Government Publishing Office

What are some of the new initiatives from the U.S. Government Publishing Office? Director Hugh Halpern offers an update, which has been incorporated into our program for tonight’s Building Democracy’s Library event.

Many thanks to Director Halpern and the U.S. Government Publishing Office for sharing this update!