Government Records and Information:
Real Risks and Potential Losses
Center for Research Libraries Global Resources Collections Forum:
Leviathan: Libraries and Government Information in the Age of Big Data
Paper and Presentation, James A. Jacobs
- Born-Digital
U.S. Federal Government Information: Preservation and Access
- presentation [audio and slides]
- Speaker Notes for presentation
Examples
Why is preserving web-published information a complex task? The three
items below provide some different examples of the preservation and selection issues.
- Keystone XL Pipeline:
Final Supplemental Environmental Impact Statement (SEIS)
To preserve this page intact, one would have to collect a total of 13 urls: 7 images, 3 javascript files, 2 css style sheets, and the page itself.
The page itself is, however, only a list of links to the 94 files that comprise a single title, the Keystone pipeline EIS. Do any of the PDF files themselves contain a list of all the PDF files? To preserve this title, would one have to preserve this page and all 94 PDFs?
This example raises the question of what should be preserved, how many files would need to be preserved, and how the files would be linked and described. - The
White House current third party (social media) pages / accounts
This page lists a total of 92 accounts that the White House uses on 26 non-.gov, social media websites. It raises many questions: How much information is duplicated? How much is unique? Do these sites simply point back to .gov websites? Would it be of value to preserve how the White House presents itself differently to different communities? - Executive
Order 13662
The above link is to an official, authenticated version of an Executive Order. This same executive order also appears in other formats on other websites. A total of ten different URLs have (apparently) the same information and metadata about the content. This raises the question of whether all these really are the same and, if so, how to know that and whether to preserve them all. It also raises the question of whether and how to preserve the links to these different versions: will individuals have linked to the different versions? Should we preserve what was actually linked-to?
Other copies: White House, Federal Register, Federal Register printer-friendly, GPO Federal Regsiter PDF, GPO Federal Register html, GPO html, GPO mods, GPO Premis, GPO zip
The following podcast provides an interesting conversation about the experiences of librarians in attempting to preserve web content using web-harvesting.
- humans.txt.mp3: The Web Archivists Are Present, More Podcast Less Process
(February 24, 2014) [67 minutes mp3 file].
Jefferson Bailey and Joshua Ranger discuss the complexities of web archiving with guests Alex Thurman (Web Resources Collection Coordinator, Columbia University Libraries) and Lily Pregill (Project Coordinator & Systems Manager, New York Art Resources Consortium).
Additional References
- Chesapeake Digital Preservation Group. "Link Rot" and Legal Resources on the Web: A 2013 Analysis.
- Jacobs, James A. Government Link Rot, FreeGovInfo (Jan 3, 2014).
- Center for Research Libraries, and OCLC. 2007. Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist Version 1.0. Chicago, IL: Center for Research Libraries.
- Consultative Committee for Space Data Systems. 2012. Reference Model for an Open Archival Information System (OAIS). Magenta Book, issue 2. CCSDS Publications 650.0-M-2. Washington, D.C.: Consultative Committee for Space Data Systems.
- Chronology of Disappearing Government Information, 1998-2002. [PDF file] Compiled by Barbara Miller for ALA/GODORT Education Committee With special assistance of Karrie Peterson. [This document was originally available as an MS Word file at okstate.edu]
- Less Access to Less Information By and About the U.S. Government. From 1981 until 1998, Anne Heanue and the fine folks at the Washington Office of the American Library Association (ALA) published an amazing series called Less Access to Less Information by and about the U.S. Government, a chronology of efforts to restrict and privatize government information.
- Public Law 103-40, the "Government Printing Office Electronic Information Access Enhancement Act of 1993." 107 STAT 112 (June 8, 1993).
- Shaw, Thomas Shuler. "Library Associations and Public Documents." Library Trends (July,1966) p167-177. [includes information about DocEx]
- U.S. Congress. House. House Report no. 106-796, Conference Report To accompany H.R. 4516, July 27 2000
- U.S. Government Printing Office. Keeping America Informed The U.S. Government Printing Office 150 Years Of Service To The Nation. U.S. GPO. 2011. 149p. illus. SuDoc# GP 1.2:IN 3/2. [ purl ] (for more information about INS v. Chadha, 462 U.S. 919, 1983).
- U.S. Government Printing Office. "The Electronic Federal Depository Library Program: Transition Plan, FY 1996 - FY 1998." Administrative Notes, VOL. 16, #18 (Dec. 29, 1995).
- U.S. Government Printing Office. Superintendent Of Documents Policy Statement (SOD) 301 (Sep 28, 2006).
- Wagner, Ralph D. A history of the Farmington Plan. Lanham, Md. : Scarecrow Press, 2002.
- Congressional Research Service. 2013. Retaining and Preserving Federal Records in a Digital Environment: Background and Issues for Congress, by Wendy Ginsberg, CRS report R43165 (July 26, 2013)
- U.S. National Archives and Records Administration. 2010. Guidance on Managing Records in Web 2.0/Social Media Platforms, NARA Bulletin 2011-02 (October 20, 2010).
- U.S. National Archives and Records Administration. 2011. 2010 Records Management Self-Assessment Report, Washington, DC, (Feb 22, 2011).
- U.S. National Archives and Records Administration. 2013. Guidance on managing social media records, NARA Bulletin 2013-XX, Washington, DC (June 26, 2013).
- Lost Docs Project.
LINK ROT
STANDARDS
HISTORY
PRESERVING GOVERNMENT RECORDS
Tracking Fugitives
Selected Technologies and Infrastructures
- Archive-It The Internet Archive.
- archive.today The Internet Archive
- CONTENTdm OCLC.
- Digital Public Library of America
- DPN: The Digital Preservation Network
- dSpace
- DuraSpace
- EPrints
- Fedora
- Greenstone
- Islandora.
- LOCKSS-USDOCS LOCKSS
- Memento
- MetaArchive
- National Digital Stewardship Alliance Library of Congress
- Omeka Corporation for Digital Scholarship.
- Viewshare Library of Congress
- Web Archiving Service University of California.
Temporal Context
- Ainsworth, Scott G. 2013. "Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing Web Archives." Bulletin of IEEE Technical Committee on Digital Libraries 9 (2).
- The Web's Missing Dimension: Time, Jon Udell interviews Herbert Van de Sompel, Digital Librarian,
Interviews with Innovators (April 14, 2010) [44 minutes
mp3 file].
Herbert Van de Sompel is a digital librarian who wonders why the web has no memory, and wants to do something about that. In this conversation he tells host Jon Udell about the Memento project, a proposed protocol that browsers can use to scroll through historical versions of web resources. - Memento: Time Travel for the Web. by Herbert Van de Sompel, Michael L. Nelson, Robert Sanderson, Lyudmila L. Balakireva, Scott Ainsworth, Harihar Shankar.
Last modified: Jun
20, 2018