This addresses a report, "REPORT TO THE ARCHIVIST OF THE UNITED STATES", by the The FedFlix Program for Access to Government Video, which was Submitted by Public.Resource.Org October 1, 2011, to:
Honorable David Ferriero
Archivist of the United States
National Archives and Records Administration
700 Pennsylvania Avenue, NorthwestWashington, D.C. 20408
I would like to point out that at the end of this report there is a list of videos with contentID matches. I see a lot of familiar names in that list. This report was filed 1 and a half years ago. How many thousands more of videos have been fraudulently claimed in that time by the copyright fraudsters?
I used to admire Youtube. That admiration and high regard has been seriously damaged now that I am aware of this ever growing problem. It has been damaged because even after a report like this (that should be sending red flags up all over the place), an unreasonable amount of time has past without any indication of anyone in Youtube's organization taking proactive action to eliminate this massive fraud.
Read it and weep. This situation is far from over. This situation is intolerable. As a citizen of the United States of America, I demand that our rights to use, and not be denied use of, public domain material be protected with as much fervor as the rights of them who hold copyrights. In fact, I postulate that the protection of public domain property for the use of the public is of greater importance than the protection of a copyright held by an individual or company.
This document can be found at:https://public.resource.org/ntis.gov/contentid.pdf
===========List of claimants and number of matches===========
Music Publishing Rights Collecting Society: 61
Warner Chappell: 6
The Orchard Music: 5
UMPG Publishing: 7
Lizenz Agentur: 10
Fremantle International: 1
Creation Films: 90
The Orchard: 57
Sony ATV Publishing: 5
NBC Universal: 5
Discovery Communications: 4
UNIPPM (Go Digital): 12
Al Jazeera: 2
Archive Farms, Inc.: 1
MyVideoRights USA (MVR): 38
EMI Publishing: 1
New Video Group: 1
Paul Brownstein Productions: 1
APM (Rumblefish): 8
BFM Digital: 5
GoDigital for a third party: 1
Image Entertainment: 2
Mobile 1 Music: 1
Milestone Film & Video: 1
============ The Report ===============================
REPORT TO THE ARCHIVIST OF THE UNITED STATES
The FedFlix Program for Access to Government VideoSubmitted byPublic.Resource.Org
October 1, 2011This report is licensed under Creative Commons CC-Zero: No Rights Reserved
The report may be viewed athttps://public.resource.org/ntis.gov/contentid.pdfOctober 1, 2011
Honorable David Ferriero
Archivist of the United States
National Archives and Records Administration
700 Pennsylvania Avenue, Northwest
Washington, D.C. 20408
Dear Mr. Ferriero:
I am pleased to transmit to you a report on the FedFlix program, which makes government-produced and government-maintained video available to the public. This report brieﬂy summarizes the current status of the FedFlix program, discusses a series of copyright-related issues, and concludes with a set of recommendations. Two appendices provide further details on the issues raised in this report.
Background and Status of the FedFlix Program
The FedFlix program began in 2007 as an ad hoc e!ort byPublic.Resource.Org. Our initial e!orts consisted of purchasing or requesting government-produced video from the National Technical Information Service (NTIS) or obtained by writing to organizations such as the Mine Safety and Health Administration (MSHA) and the Occupational Safety and Health Administration (OSHA). All of these videos are digitized and uploaded with metadata to YouTube, the Internet Archive, and our own bulk data servers, which support the HTTP, FTP, and RSYNC protocols.
The program became more formal with the signing of Joint Venture Agreement NTIS-1832 on November 2, 2007 in which we agreed to accept periodic shipments of VHS, Betacam, and Umatic tapes from NTIS. Those tapes were all digitized and the tapes and a disk drive were then returned to NTIS, a service we performed at no cost to the government.
Real progress on FedFlix did not occur until after your appointment as Archivist of the United States. After testimony before the U.S. House of Representatives (see Appendix A), you reached out to us and the program was expanded to include the installation of DVD duplicators at NARA’s College Park facility. Volunteers from the International Amateur Scanning League have since copied several thousand DVDs. Your sta! in the Motion Picture, Sound, and Video Research Room have been very helpful in assisting the volunteers and collecting and dispatching completed disks to us for further processing.
Your leadership in helping bring in NARA-maintained video has also been instrumental in expanding the program to other agencies. For example, the Department of Defense furnished us with 816 DVDs. More recently, Chairman Darrel Issa and Speaker John Boehner supported a program to createHouse.Resource.Org to make high-quality
PUBLIC.RESOURCE.ORG ~ A Nonprofit Corporation Public Works for a Better Government
c a r l @ m e d i a . o r g 1 0 0 5 G R A V E N S T E I N H I G H W A Y N O R T H , S E B A S T O P O L , C A L I F O R N I A 9 5 4 7 2 • P H : ( 7 0 7 ) 8 2 7 - 7 2 9 0 • F X : ( 7 0 7 ) 8 2 9 - 0 1 0 4 video from Congressional hearings available using the same basic methods we
developed for the FedFlix program. In addition, we’ve collected numerous videos from other agencies of the Federal Government as well as a comprehensive collection of videos related to the Federal Highway Administration (FHWA) and numerous videos from agricultural extension programs and state occupational safety programs.
As of the end of September, the YouTube channel has received 8,223,221 video views and the Internet Archive collection has received 9,910,053 views. In addition to these 18,132,274 views, there are numerous groups that are downloading the videos from our bulk server or from the Internet Archive and uploading them to their own channels.
There are approximately 5,900 videos that have been made available from the executive branch of the government. Approximately 80 pounds of media not already in the NARA collections were donated by Public.Resource.org to the National Archives and you are currently conducting a program in conjunction with the Wikimedia Foundation to catalog these materials.
There are very di!erent viewing patterns on YouTube and the Internet Archive, underscoring the importance of using multiple distribution channels. There is no overlap in the top 10 videos (see Table 1). In addition, you will note that some of the top videos are dated and deal with technical topics such refrigeration, underscoring the fact that more up-to-date information on these topics is not available on the Internet today and would be very useful. These videos are also a classic “long tail”: 1,710 of the videos on the Internet Archive had more than 100 views.
Additional information about viewing patterns and usage can be gleaned from services such as YouTube’s Insight service. Table 2 shows that most video views on YouTube are from people already on YouTube. Only 2.5% of views are the direct result of a Google search. However, a large number of views come from a direct link or an embedded player on a web site, indicating that a large number of people are making use of these videos in creating their own services. The fact that most of the video views on YouTube come from within their system again underscores the importance of using multiple channels of distribution.
Some perspective on viewer numbers can be gained by comparing video views across YouTube Channels. Compared to FedFlix, far more video views have been accumulated by the White House (61,225, 945), NASA (26,153, 487) and the 5 military channels (33,906,904). However, the rest of the government lags far behind, particularly our national cultural institutions. The Library of Congress has 2,603,787 views, the Smithsonian has 4,260,381 views spread across a dozen channels, and the National
Archives has 810,136 views.The lack of viewership in our national cultural institutions is not because of a lack of material. The Smithsonian, for example, has assets ranging from pandas at the National Zoo to the amazing assets at the National Air and Space Museum. The potential can be illustrated by looking at channels such as National Geographic (624,869,685 views) or the History Channel (14,221,026 views). The tens of thousands of videos buried in the vaults of the National Archives, coupled with the assets at locations such as the Library of Congress National Audio-Visual Conservation Center and the Smithsonian Institution’s video collections have the potential for creating a top Internet destination.
Copyright Assertions on Government Videos
Making videos from the National Archives available has a potential of reaching tens of millions of people with important historical, vocational, safety, and health information. The importance of public access is expressed in the mission of the National Archives of “ensuring that the people can discover, use, and learn from this documentary heritage.” However, it is important that NARA make these videos and other materials
available as another part of the mission: “safeguarding and preserving” our heritage. NARA needs to make videos available so that it can play defense as well as defense.
One of the services that YouTube offers to content producers is to register videos they own in their Content ID system. The system then scans the videos uploaded by other users looking for a match. The system ﬂags audiovisual content as well as musical compositions. The purpose of the system is to identify potential copyright violations. The owners of the material then have various options on actions to take on the offending videos including:
• No action.
• Muting the sound track.
• Monetizing the material by placing an ad before the video plays or a banner ad while the video is playing.
• Blocking the video in certain jurisdictions or worldwide.
• Having the video deleted and a copyright violation registered on the uploader’s account. After 3 such violations are registered, YouTube will suspend the user’s account.
Over the last year, 325 such Content ID matches have been received on our channel. As more content producers register materials and as YouTube scans our content, additional matches continue to be registered at the rate of several per week. These matches are attached as Appendix B to this report.
In two of the 325 matches, we posted videos that had clear copyright violations, both of which were removed. The ﬁrst, item 295, is “Chang—A Drama of the Wilderness,” a 1927 silent ﬁlm in the NARA collection. Since the video was available for sale on Amazon as part of the partnership with NARA and is available for copy without restriction or notices at College Park, we had (wrongly) assumed that it would be OK to post the video. As a result of this posting, our account experienced a violation which has the e!ect of prohibiting our labeling any videos as being available for reuse under a Creative Commons Attribution license for a period of six months.
The second video with deﬁnite copyright restrictions is item 296, a Time, Inc. ﬁlm produced in 1940 and deposited in the National Archives with donor restrictions. The ﬁlm was available for copy with no restrictions noted. As we conducted the audit of Content ID matches, we noticed this restriction and deleted the video.
The remaining 323 videos listed in Appendix B are much less clearcut. In some cases, the matches are extremely puzzling. For example, items 4, 5, 7, 10, 11, 12, 155, 156, 267, 306, and 308 are all silent ﬁlms but we have been notiﬁed that a “Music Publishing Rights Collecting Society” owns or administers a “Musical Composition.” At ﬁrst we thought we had perhaps joined composer Mike Batt in inadvertently violating Report to the Archivist, Page 4the rights on John Cage’s 4’ 33”. However, the Cage piece was in 1952, and the videos we had ﬂagged included a home movie of FDR in Poughkeepsie from 1937 donated by the creator of the ﬁlm. That video, number 308 in our list, has been ﬂagged for 3 separate violations of musical compositions.
There are other assertions of violations of musical composition rights that are more difficult to debug. For example, item number 227 consists of John F. Kennedy’s address on the Cuba missile crisis. The only music on this video is “Hail to the Chief,” which is an 1812 composition written by James Sanderson and has been played to honor the Chief Executive since the tradition began with Andrew Jackson in 1829. While the John F. Kennedy video has no credits, we presume the music is being played by the U.S. Marine Corps Band. In the case of item number 114, the only music is again
Hail to the Chief being played at the 1933 inaugural of Franklin Roosevelt.
When it comes to Content ID matches on a full video, there are some items that are also hard to understand. For example, the famous 1942 ﬁlm “Battle of Midway” has been marked as owned by NBC Universal and is blocked from view worldwide. However, this video is marked by NARA as having no use restrictions and the ﬁlm was produced and photographed by the United States Navy as part of the war e!ort. It is widely marked in directories such as IMDB as being in the public domain.
NBC Universal is not the only content producer that appears to be homesteading the public domain. A large number of the Universal Newsreels are marked as being the property of an entity that goes by Creation Films. This group has “monetized” the videos on our channel, imposing ads on each of the videos. However, the Universal Newsreels were gifted into the public domain by Universal City Studios in 1976.
Likewise, there are many other classic ﬁlms that have received matches. Al Jazeera has claimed rights to “Work Pays America” (item 117), a 1936 ﬁlm by the Works Progress Administration which we obtained from the FDR Presidential Library. “Memphis Belle” (item 216) is claimed by and monetized by an entity called The Orchard, despite the fact that this ﬁlm was created by Office of Information and War Activities in 1946. “Wings for this Man” (item 237), claimed and monetized by Creation Films, is a 1945
ﬁlm created by the U.S. Army Air Forces First Motion Picture Unit to honor the Tuskegee Airman and features Ronald Reagan as the narrator. “Duck and Cover” (item 250) is monetized by Image Entertainment in the U.S. and blocked in the rest of the world, despite having being created by the U.S. Civil Defense Branch with the patriotic contribution of the schoolchildren of New York City and Astoria, New York.
The copyright assertions also apply to more modern ﬁlms, particularly some important ﬁlms about the space race. “Friendship 7” (item 88) is claimed by Discovery Communications, but was created by NASA in 1962. Discovery Communications also claims rights to “Apollo 13: Houston, We’ve Got a Problem” (item 90) and “The Mission of Apollo” (item 91).
Not all of the copyright assertions are necessarily false. There may be instances where the government licensed use of music or footage from commercial sources. However, as explained in the next section, we have no way of determining where in a video an assertion occurs or what restrictions the government may have agreed to. One more interesting fact is that 214 of the 325 videos have successfully undergone machine transcription, making these ﬁlms accessible for the ﬁrst time.
Who Has Title to the Public Domain?
The Content ID system allows us to protest a match under 3 circumstances, as explained by YouTube:
• The content was misidentiﬁed. Your original content was misidentiﬁed; for
example, your family picnic was mistakenly identiﬁed as a scene from The
Godfather. Mistakes of this type are very rare but possible.
• You have the right to use the content online. You have written permission from the content owner to use the material on YouTube.
• Fair Use / Fair Dealing. If you believe your use meets the legal requirements for exemption from copyright under appropriate law, you can dispute the claim. If you are unsure, you should seek legal counsel before submitting a dispute.
There are several impediments to pursuing the dispute process.Public.Resource.Org is not the owner of the videos in question. For legal purposes, the owner is the National Archives and Records Administration. Unfortunately, the Archives Research Catalog (ARC) has very little useful information in most cases on the topic of ownership. In particular, there are numerous instances where ARC clearly marks use restrictions as “unrestricted.” A query to the ARC sta! indicated that we could use that marking as “written” proof that a video was conclusively in the public domain. However, further inquiries with your General Counsel indicated that this information was wrong, and in many instances “unrestricted” was a blanket assertion put on all entries with no research.
Because the ARC catalog is either incorrect or incomplete, there is no written documentation of the copyright status of the works visible to the public. We have requested the transfer records for certain high-visibility items, such as “Battle of Midway,” but our request for this information to the ARC sta! was not answered. As a temporary matter, we hope you will treat this report as a request under FOIA for transfer records for the following items, which we previously requested:
• ARC Identiﬁer 38905 / Local Identiﬁer 208-UN: Motion Picture Films from
"United News" Newsreels, compiled 1942 - 1945. The whole series has been claimed by an entity called Creation Films and they are "monetizing" the content by placing ads on my copies of the videos.
• Duck and Cover, ca. 1955, ARC Identiﬁer 38174 / Local Identiﬁer 171.66. Image Entertainment has claimed this video and it is BLOCKED worldwide except in the U.S. as a result.
• BATTLE OF MIDWAY, 1942, ARC Identiﬁer 65422 / Local Identiﬁer 342-
USAF-16985. This is claimed by NBC and is blocked worldwide.
• NAZI CONCENTRATION CAMPS, ca. 1945, ARC Identiﬁer 43452 / Local Identiﬁer 238.2
• MEMPHIS BELLE; A STORY OF A FLYING FORTRESS, 1944, ARC Identiﬁer 38755 / Local Identiﬁer 208.238
While transfer information might give Public.Resource.Org the “written permission” to attempt to dispute those speciﬁc entries, this is only a band-aid. This process needs to be much more systematic. As such, we would like to make some speciﬁc recommendations for your consideration:
• Transfer records for a speciﬁc item in the ARC catalog appear to be the best chance of determining if any copyright restrictions are readily apparent in a work. But, there is no procedure for requesting those transfer records and email requests are a stochastic process. While we could use the FOIA as above, this is not a satisfactory procedure
• The ARC catalog uses the word “unrestricted” on entries with no basis in fact. This is misleading to the public and the catalog should be ﬁxed.
• Once a usage restriction has been identiﬁed using Content ID, there is no path to update the information in the ARC catalog. Having up-to-date information on actual restrictions would be valuable to the public that uses these materials.
• Videos such as “Chang—Drama of the Wilderness” have received takedown notices. However, the videos are readily available to be copied in College Park and are available for sale on Amazon with a link to the sale item from the NARA catalogs. NARA receives a portion of the proceeds of such sales. Have you taken appropriate steps to make sure you are not in violation of the law?
• Public.Resource.Org is not the owner of these videos, but NARA is. If NARA maintained an equivalent presence on YouTube, you could register for the “Partner Program” and upload these videos to the Content ID system as a way of preemptively staking out the public domain. Likewise, if you had these videos available for the public to see, you would also be notiﬁed of any claims by other parties on this content instead of relying on reports such as this.
More generally, it is important for NARA to take a more active role in making these materials available to the public on multiple venues. While NARA has a minimal YouTube account, your sta! has not answered repeated offers to use materials we’ve digitized and made available. The sta! has recently added iTunes University to your portfolio of outlets, but you should not ignore the Internet Archive, which would allow you to place full-resolution ﬁles on-line so users could download them and make their own DVDs or use them in ﬁlm productions. The Library of Congress American Memory system is a good example of such high-resolution access to data.
Participating actively using multiple channels is essential to meeting your mission of safeguarding and preserving this important information. In addition to better public access, there are all sorts of incidental beneﬁts, such as automatic machine transcription to make the materials more accessible, better metadata, and better knowledge of the copyright status of government videos.
Public.Resource.Org would be very pleased if you would take the FedFlix program over from us. You can download all the videos from the net, or we’d be happy to send you a disk drive. We can also transfer our YouTube and Internet Archive collections over to you as a way of jump-starting your own e!orts. The FedFlix e!ort has been a demonstration project, but it is time now for the government to take the wheel.
Testimony Before the Committee on Oversight and Government Reform
U.S. House of Representatives
Prepared Statement Of Carl Malamud Public.Resource.Org
U.S. House of Representatives
Committee on Oversight and Government Reform Subcommittee on Information Policy, Census, and National Archives Honorable William Lacy Clay, Chairman
“History Museum or Records Access Agency?
Deﬁning and Fulﬁlling the Mission of
the National Archives and Records Administration”
Wednesday, December 16, 2009
2154 Rayburn House Ofﬁce Building
Chairman Clay, Ranking Member McHenry, Members of the Subcommittee:
Thank you for your invitation to testify before you on the National Archives and Record Administration (NARA) and the proper balance between the agency’s core mission of record management, preservation and access, and its creation and management of museum exhibits, educational and public programs, and other outreach efforts.
My name is Carl Malamud and I am the Founder and President ofPublic.Resource.Org, a 501(c)(3) nonproﬁt corporation with a charter of making government information more accessible. We are responsible for placing over 90 million pages of government documents on the Internet that were not previously available, including almost all the opinions of the U.S.
Courts of Appeals,1 20 million pages of U.S. District Court documents,2
and the building, ﬁre, electrical, plumbing and other public safety codes for most of the country.3 From 1993-1995, when I ran the Internet Multicasting Service, I was responsible for placing the U.S. Securities
1 John Markoff, A Quest to Get More Court Rulings Online, and Free, New York Times, August 20, 2007.
2 John Schwartz, An Effort to Upgrade a Court Archive System to Free and Easy, New York Times, February 12, 2009.
3 Noam Cohen, Who Owns the Law? Arguments May Ensue, New York Times, September 28, 2008.
and Exchange Commission and Patent databases online. As part of running the ﬁrst radio station on the Internet, I was a member of the U.S. House and Senate Radio-TV Galleries, where we connected the ﬂoors of the U.S. House of Representatives and Senate to the Internet as live webcasts.
In addition to placing new information online, Public.Resource.Org has been active in ﬁnding and redacting Social Security numbers and other Protected Personal Information (PPI), including the removal of approximately 500,000 Social Security numbers of military ofﬁcers from government and commercial copies of the Congressional Record,4 and an audit of 30 U.S. District Courts that found signiﬁcant privacy violations and resulted in the recent changes in procedures to better protect privacy recently instituted by the Judicial Conference.5
Public.Resource.Org also runs the FedFlix program, a joint venture with the National Technical Information Service (NTIS). In this program, NTIS and other agencies send us video tapes, which we digitize, then return the tapes and a disk drive to the agency. In addition to giving the agency a digital copy, we upload all these videos to YouTube and the Internet Archive, and also make them available in bulk on our systems where they serve as a public domain stock footage library. No money changes hands in this program and the only cost to the government is to ship the tapes to us.
FedFlix is one of the most popular government channels on YouTube and has received more channel views than the Smithsonian and NARA channels combined. A successful pilot of FedFlix with the House of Representatives was conducted with 4 committees, including the Committee on Oversight and Government Reform,6 and the pilot received the support of Speaker Pelosi, who called it a “wonderful program.”7 We are hopeful that in 2010, the House Broadcast Studio will be able to begin to loan us tapes of committees that choose to make their archives more broadly available.
4 Charlie Reed, Military lags in safeguarding ofﬁcers’ identities, Stars and Stripes, November 2, 2009.
5 Henry Wigglesworth and Heather Williams, Social Security Numbers in District Court Case Files,Administrative Ofﬁce of the U.S. Courts, August 24, 2009.
6 Committee on Oversight and Government Reform, Hearings on Political Interference with Climate Science, U.S. House of Representatives, March 19, 2007.
7 Speaker Nancy Pelosi, Letter to Public.Resource.Org, April 11, 2008.
A Mission of Preservation, Administration, and Access
Your invitation to testify asked me to discuss NARA’s mission to preserve and ensure access to records, and asked if I believe the agency’s efforts in exhibits, civic education and public programs inﬂuence that performance. When President Herbert Hoover laid the cornerstone for the National Archives Building, he stated:
“The building which is rising here will house the name and record of every patriot who bore arms for our country in the Revolutionary War, as well as those of all later wars. Further, there will be aggregated here the most sacred documents of our history, the originals of the Declaration of Independence and of the Constitution of the United States. Here will be preserved all the other records that bind State to State and the hearts of all
our people in an indissoluble union.”8
The display of the Declaration of Independence and of the Constitution are certainly a visible symbol of our National Archives, but they are merely a symbol. It is the preservation of records, and the corollary processes of gathering those records from the agencies and making them available to the public that are the core mission of this unique institution. To the extent that the National Archives has a role to play that is more than incidental in exhibitions, I believe that role is primarily with the Presidential Libraries.
In a world of inﬁnite resources, one cannot object to the National Archives competing with organizations such as the Smithsonian Institution that are “in the business” of being a museum, but the sad truth is that NARA faces signiﬁcant challenges in the areas it must work in and must focus intently on overcoming those challenges. In this testimony, I will address some of those challenges, including the Electronic Records Archives (ERA), electronic
records management, digitization of the archives, public access to the archives, and the role of public-private partnerships.
Electronic Records Archives (ERA)
NARA is in the process of launching a highly ambitious ERA system for the ingestion and preservation of electronic records. Last month, David A. Powner of the Government Accountability Ofﬁce appeared before this Subcommittee and testiﬁed that through FY2008, NARA has spent $237 million on the ERA system including $112 million in disbursements to
8 Herbert Hoover, Remarks Upon Laying the Cornerstone of the National Archives Building, February 20, 1933.
the contractor, Lockheed Martin. The total life cycle cost for the system is $551.4 million, of which $317 million will go to Lockheed Martin.9 These are breathtaking numbers for a computer system, even by the standards of federal government procurement.
It was thus very disturbing to read that the GAO cannot ﬁgure out what the system does. Mr. Powner stated:
“NARA’s plans for ERA lacked sufﬁcient detail to, for example, clearly show what functions had been delivered to date or were to be included in future increments and at what cost.”
The GAO testimony went on to explain that the system included no backup and restore capability despite $237 million already spent, and that the backup and restore capabilities would only be included in the so-called Increment 4, which is currently in the “early planning, analysis, and design” stages and is not slated to be completed until 2012. Of the 10 mandated activities for any agency doing contingency planning and continuity of operations, all 10 were deﬁcient.10
It is not only the General Accountability Ofﬁce that is mystiﬁed. NARA’s own Inspector General testiﬁed in that same hearing that he has no idea what the system does:
“As engaged as I have been, I do not know what capabilities and capacity will reside in ERA when the contractor throws another party, turns in their badges, shakes hands, and exits the door. Such a statement should be viewed as troubling.”11
Despite a continuing series of incidents, including a Cure Letter sent to the contractor, the Acting Archivist reported in that same hearing that things were back on track, but then went on to state that “the subcommittee should also know that the start of Increment 3 development has not been as smooth as desired ... We believe that this is part of the normal
9 David A. Powner, Progress and Risks in Implementing its Electronic Records Archive Initiative, Government Accountability Ofﬁce, November 5, 2009 (GAO-10-222T).
10 See National Institute of Standards and Technology, Recommended Security Controls for Federal Information Systems, Special Publication 800-53 Revision 2 (Gaithersburg, MD: December 2007). Note that GAO used Revision 1 (December 2006) for their audit and that there is a Revision 3 about to be ﬁnalized.
11 Paul Brachfeld, Prepared Statement of the NARA Inspector General, Information Policy, Census, and National Archives Subcommittee, November 5, 2009.
give and take between the agency and its contractor that occurs with any large-scale contract.”12
After a thorough review of all the minutes of the Advisory Committee on ERA (ACERA), interviews with some of its members, examination of all available presentation materials, GAO reports, IG reports, and further research, I must say I share the mystiﬁcation of GAO and the Inspector General. I have no idea what the system does. In the world of complex
computer systems, one expects summary information that give some indication of scope, such as the number of processors, the amount of disk space, the programming languages being used, or the number of programmers. Likewise, one expects ﬁnancial metrics such as how much money is being spent on Oracle or Documentum licenses, the brands and models of hardware, and the cost of Internet transit, colocation space, or support services. None of that information seems readily available.
What I did ﬁnd in the ACERA meeting minutes was particularly disturbing. For example, on November 16, 2008, NARA staff presented the Findings of the Online Public Access (OPA) Integrated Product Team (IPT) but cautioned that even at that late date the ﬁndings “had not been fully vetted by NARA.”13 The presentation consisted of a series of mockups of web
pages, and Best Current Practices for web site design such as wireframe diagrams or an information architecture were not presented. Most importantly, there was no mention at all of an Application Programming Interface (API) for access to the ERA system. Best Current Practice for design of public access to government data is to start with bulk access, then an API, then ﬁnally worrying about issues such as web site design, colors, and fonts.14 Public access seems to be an afterthought and is not being pursued with any degree of rigor.
An examination of the ACERA minutes yields some even more disturbing information. While there is no contingency planning for Continuity of Operations, there is contingency planning by NARA for an alternative architecture that would replace the ERA system in 2011, when NARA will be recompeting the contract.15 In other words, it appears that there is a
contingency plan to simply throw away the current system. Even more shocking, it appears
12 Adrienne Thomas, Prepared Statement of the NARA Acting Archivist, Information Policy, Census, and National Archives Subcommittee, November 5, 2009.
13 Final Minutes of the Advisory Committee on the Electronic Records Archives, November 6, 2008. See also Pamela Wright, Update on Online Public Access for ACERA, April 30, 2009.
14 See Robinson et. al., Government Data and the Invisible Hand, Yale Journal of Law & Technology, Vol. 11, p. 160, 2009.
15 Final Minutes of the Advisory Committee on Electronic Records Archives, November 5, 2008.
that Lockheed Martin has also put some time to thinking about the future of ERA as they have taken out 15 patent applications on the system, and it is unclear if the government will have full rights in the case of a vendor change.16
Even if the federal government has full rights, it is clear that any state archivist wishing to use this half-billion-dollar computer system developed at taxpayer expense will have to pay dearly to Lockheed Martin for those rights. It seems obvious that if the taxpayers fork over that much money, the people should have the rights to use the resulting code. (I would go so
far as to say that any nonproﬁt corporation or government agency that develops software should make it open source as a precondition of their use of taxpayer dollars.)
What can one do about the ERA system? It is my worry that Lockheed Martin and NARA, in the development of this system over many years, perhaps did not anticipate recent radical decrease in the cost of disk space or changes in paradigms for enterprise computing, such as large arrays of commodity computers based largely on open source software used in
systems such as Amazon or Google.
A good hard look this system is clearly in order. One option would be to bring in a “tiger team” to scrub this system from top to bottom and make recommendations as to which parts of the system might yield useful results and which might perhaps be thrown away as a lost effort that needs to be restarted. This is perhaps drastic action, but it is clear from testimony
to the congress over several years, and a history of GAO and IG bafﬂement, that a strong and forceful audit is necessary.17
Electronic Records Management
One of the reasons that the ERA system is so complex is because of the incoming deluge of electronic records. It is useful to remember that in the period 1935-1939 when the National Archives was being created, Archivist Connor faced a similar challenge. At ﬁrst, the archives were simply unable to keep up. In 1936, 9,178 series of records were submitted by agencies
to the special examiners charged with the “Survey of Useless Records,” but they were able to
16 Final Minutes of the Advisory Committee on Electronic Records Archives, May 1, 2008. See U.S. Patent Application 11/797,278, Systems and methods for establishing authenticity of electronic records in an archives system (Filed May 2, 2007); U.S. Patent Application 11/797,567, System and method for preservation of digital records (Filed May 4, 2007); U.S. Patent Application 11/797,644, System and method for managing records through establishing semantic coherence of related digital components (Filed May 4, 2007).
17 Government Accountability Ofﬁce, National Archives and Records Administration’s Acquisition of Major System Faces Risks, GAO-03-880, August, 2003.
examine only 2,484 series. In 1937, they received 27,873 record series, and were able to examine only 3,237.18
Archivist Connor instituted a series of changes, moving the examiners closer to the source and providing better guidance and standardized forms and schedule to the agencies. Not only did these changes reduce the backlog for his agency, these contributions to archival science spread to archives in the states and other countries.
For many years, records management has been sorely neglected.19
The Archivist is charged by law to “promulgate standards, procedures, and guidelines with respect to records management and the conduct of records management studies.”20 But, guidance has been limited to telling agencies to “print and save” documents, and a recent survey shows no agency-wide policies for important archives such as electronic mail.21
It was heartening to hear Archivist Ferriero list this area as one of his key concerns, stating that he would reinstitute agency inspections and that “NARA should play a leadership role.”22 While NARA should indeed play a leadership role, it will require the active participation of the entire government. Archivist Connor had a similar issue, when he needed
government-wide cooperation. He formed a National Archives Council, and the initial meeting was hosted by President Roosevelt in the Cabinet Room. Secretary of State Cordell Hull was named Chairman of the Council. This established the issue of records management as one of great import, and in the second meeting of the Council a resolution was passed that speciﬁed how agencies should maintain their records and which should be sent over to the Archives and when.23
18 Donald R. McCoy, The National Archives: America’s Ministry of Documents 1934–1968, University of North Carolina Press (Chapel Hill: 1978), pp. 62–63.
19 Government Accountability Ofﬁce, Federal Records Management: A History of Neglect, PLRD-81-2, February 24, 1981.
20 General responsibilities for record management, 44 U.S.C. § 2904.
21 Statement of Patrice McDermott, Subcommittee on Federal Financial Management, Government Information, Federal Services, and Internal Security, U.S. Senate Committee on Homeland Security and Government Affairs, May 14, 2008. See also Citizens for Responsibility and Ethics in Washington, Record Chaos: The Deplorable State of Electronic Record Keeping in the Federal Government, April 16, 2008.
22 U.S. Senate Committee on Homeland Security and Governmental Affairs, Pre-Hearing Questionnaire for the Nomination of David Ferriero to be Archivist of the United States, September 16, 2009.
23 Donald R. McCoy, National Archives, op. cit., p. 67.
After that second meeting of the Council, attendance was no longer the agency heads and delegates began to attend in their place. Later the National Archives Council was replaced with a Federal Records Council. However, that initial summit established the importance of the area and insured the cooperation of all agencies in the development of their records schedules. As he examines the area of records management, perhaps the
Archivist will consider a similar summit, perhaps calling on the support and assistance of the White House, particularly the Federal CIO, the Federal CTO, and the OIRA Administrator.
One more aspect of records management needs to be raised, and that is the conscious decision of NARA not to crawl and archive web sites on a regular basis.24 NARA has outsourced this important function to the well-respected Internet Archive, but has only provided very limited funds and has snapshots taken every two years for congressional sites and every four years for the executive branch. The results of these crawls are returned to
NARA on tape, and NARA does not make these crawls available for public access. While the Executive Ofﬁce of the President has been aggressively pursuing a goal of archiving not only the web site but also social media such as Facebook and Twitter,25 there is no evidence NARA is considering this. NARA should archive all social media, and should perform regular crawls and operate or contract out to have operated a “Wayback Machine” for government.
Digitization and Public Access
In addition to access to electronic records, one of the key challenges facing NARA today is digitization of older materials. Looking back again at Archivist Connor, we see that NARA dealt with an incoming deluge of paper records by pioneering an important set of technical advances, including the development of microﬁlm, invention of the airbrush for cleaning paper records, and invention of the laminating machine to protect paper.26
The microﬁlm effort was such a success that space needs were reduced by 95 percent!
Digitization of paper, audio tapes, video tapes, and other materials should be a key priority for NARA, as well as the Smithsonian Institution, the Library of Congress, and the Government Printing Ofﬁce. The current state of the art for mass scans of paper is about 10 cents per page, a ﬁgure that has been mentioned by players such as the Internet Archive and
24 Paul M. Wester, Jr., Memorandum to Federal Agency Contacts: End-of-Administration web snapshot, NARA Memorandum NWM 13.2008, March 27, 2008.
25 Executive Ofﬁce of the President, Solicitation for a Web Archive, Federal Business Opportunities Solicitation WHO-S-09-0003, August 21, 2009.
26 Ibid., p. 76.
Google Book Search. However, it is clear that these costs could decrease dramatically at larger scale, and that there would be additional savings in reduced storage space for those items where it is not necessary to keep the original (although it is important to note that the originals should always be kept on the most important items).
All the agencies would beneﬁt from a dramatic increase in the pace of scanning older materials, and it is instructive to look once again to the birth of NARA, an institution that was born in the middle of the last depression. One of the startup challenges Archivist Connor faced was a survey of what records actually existed. He went to Harry Hopkins, and with the support of President Roosevelt, was able to secure $1,176,000 for WPA Sponsored Project No. 4, which employed white-collar workers to survey federal archives in the states. This program put 3,171 people to work in 1,057 communities, and the project continued until 1942 when the Works Project Administration was terminated. This work produced the Historical Records
Survey and the Inventory of Federal Archives, reference aids still in use today.27
A search of recovery.gov shows no entries for the National Archives or the Library of Congress, and only a single $25 million grant to the Smithsonian for ﬁxing buildings.28 In the midst of the most severe economic downturn since the last great depression, there is a tremendous opportunity to advance the state of the art for scanning on a massive scale, while
putting people to work.
Instead of viewing digitization of materials as an opportunity, the National Archives has declared the task as out of scope and has created as an alternative a series of “publicprivate partnerships” with organizations such as Footnote.Com and Amazon.Com. These partnerships are very disturbing as they place a lien on the public domain. While the agreements in theory are non-exclusive,29 in practice they give these companies exclusive access to key NARA holdings for periods of 5 years or even longer.
An example of such a partnership is the agreement with Amazon whereby the company is able to sell public domain DVDs on its web site. It is of course wonderful that Amazon is making these DVDs available for sale on their web site. But, this deal came at a very high
27 Record Group 69.4.5, Records of the Division of Professional and Service Projects, Records of the Works Project Administration, National Archives.
28 Smithsonian Institution, Facilities Capital Recovery Plan, Recovery.Gov, last accessed December 12, 2009.
29 NARA, Plan for Digitizing Archival Materials for Public Access: 2007-2016, September 10, 2007. See in particular Appendix A: NARA Principles for Partnerships to Digitize Archival Materials.
price for NARA. If you look on the government web site and search the Archival Research Catalog for digital copies of motion pictures, almost every item that comes up in search results contain a 2-minute preview of the video, and a government advertisement encouraging users to purchase the item from “our partner, Amazon.Com.”
An examination of the sole-source contract NARA signed with Amazon shows a large number of restrictions on the government. While NARA gets back a “proof copy” and is able to allow people to view these ﬁlms in a few NARA locations, it is prohibited from posting the videos for the public to view for at least 5 years. It is unclear if NARA gets back the fullresolution digital copies of the videos, and may instead just get back the consumer DVDs at
At last count, there were 1,899 of these DVDs listed on the Amazon.Com web site, with retail prices starting at $10.95. There are three potential beneﬁts to NARA from this deal. The ﬁrst is revenue, where NARA receives 20% of total Amazon.Com revenue, minus “ingestion fees” of $35/tape and $150/ﬁlm. To date, total NARA revenue on this agreement has been $3,273.66.32 The second beneﬁt to NARA is making inexpensive copies of the videos available to a mass audience, however neither the prices nor the total revenue seem to indicate that this video has been broadly distributed. (Even if all ingestion fees are included as offsets, I estimate a maximum possible gross revenue of $110,000 and a more likely gross revenue well under $50,000, indicating total unit sales of 5,000–11,000 units at
$10/DVD.) The third beneﬁt is that NARA will get “free” digitization services for its video, however the contract indicates that NARA gets right to use these copies only for those videos where Amazon has made a proﬁt.33
To Amazon’s credit, they have not asserted copyright on any of these DVDs. As an experiment, Public.Resource.Org spent $691.49 and posted 47 of these videos to YouTube and to the Internet Archive.34 In less than a week, we reached a greater audience than all the
30 NARA, Distribution Services Agreement with Amazon.Com, July 1, 2007.
31 Amazon.Com, Films from the Vaults of the National Archives, last accessed December 12, 2009.
32 Electronic mail from the Chief of Staff, National Archives, December 11, 2009.
33 ArchivesNext, Follow up on terms of NARA-Amazon agreement, December 7, 2009.
34 Boing Boing, Watch America's public domain video treasures, rescue the public domain from paywalls, December 4, 2009 and Boing Boing, Watch the 1967 Bob Hope special, save America's public domain videos, December 13, 2009.
1,899 DVDs combined, and we are conﬁdent that if all 1,899 DVDs had been posted by the government, viewership would be even higher.
After the videos were posted, we received mail within two days about one of the ﬁlms, “Up In Flames, A History of Fire Fighting in the Forest.” It turns out NARA and Amazon had incorrectly characterized this as a work of the government,35 whereas the ﬁlm had in fact been created by the Forest History Society and was being used without permission. Needless
to say, we promptly removed this video and the Forest History Society has contacted both Amazon and NARA about this situation.
The reason usually given for government not to scan these materials is that it is too difﬁcult and too expensive. The equipment I use for the FedFlix program costs less than $10,000, including a $4,000 video encoder, a $350 Component to SDI converter, a $100 terabyte disk drive, and a $2,000 used Betacam deck. I estimate that two government employees with less than $30,000 in hardware could crank out 2,500 videos in a year and make available a huge stash of Betacam, U-matic, and VHS materials. Even digitizing ﬁlm has become easier.
Even if digitizing video, microﬁlm, photographs, or other materials is hard, the way NARA has gone about it is quite disturbing. Each deal has been a back-room, sole-source negotiation. No solicitations are conducted, the public is not given a chance to comment on the deals before they are ﬁnalized, and there is no indication that NARA has been examining nonproﬁt partners in addition to the .Coms they have so ardently pursued.
These “no cost to the government” deals are not just at NARA and they are not just for executive branch materials. The Government Accountability Ofﬁce entered into a similar arrangement with Thomson West for digitizing 60 million pages of federal legislative histories. At great expense to the government, these materials were packed up and sent to Thomson West, who digitize the materials and then return them to the GAO.36 What Thomson West does not return is a digital copy of the data. GAO employees were given “free” access to the Thomson West product, but that was all they got. If members of Congress wish to consult these materials on-line, they must get a commercial account with Thomson West. Meanwhile, Thomson West boasts that “thanks to an exclusive contract with the U.S.
35 Amazon.Com, Up in Flames: A History of FIre Fighting in the Forest, 1984, ASIN B000XQ1P28, ARC
36 Government Accountability Ofﬁce, Contract with Thomson West, GAO-70230025, obtained under FOIA Request by Public.Resource.Org, GAO PRI-08-081, February 27, 2008.
Government Accountability Ofﬁce (GAO), Westlaw now offers you hundreds of federal legislative histories compiled by GAO law librarians.”37
It is my understanding from NARA ofﬁcials that a similar arrangement may be in the works, in which a large number of congressional hearings would be scanned by LexisNexis and made available on that retail information service. It is my hope that this committee would carefully examine any such arrangement, as it is vital that the proceedings of the U.S. Congress be available to all citizens, not just those with a healthy expense account.
In a recent report submitted on the future of the presidential library system, NARA suggested that more rigorous guidelines governing the public-private partnership between the presidential library foundations and the government were in order.38 If more rigor is required in these partnerships with nonproﬁt corporations formed by former presidents, it goes without saying that even more attention should be paid to the relationship with .Com
companies and retail information providers.
At the very least, any such arrangements must ensure that the government receives back a full-resolution, high-quality scan and that there are no limitations on use. Any such partnerships should be available for public comment, and NARA should consider relationships with nonproﬁts, foundations, and universities as well as commercial providers.
Opportunities, Not Obstacles
In the 1930s and 1940s, the National Archives and Records Administration leapt into uncharted territory, facing daunting challenges and meeting them by creating, deﬁning, and professionalizing records management and the science of archiving. This was all new, and Archivist Connors was quick to say that he and his staff were “amateurs at our jobs.”39
In his opening statement in his conﬁrmation hearing, Archivist Ferriero also quoted Connors and his observation that 45 percent of the records he surveyed were infested with vermin and insects and that records “mingled higgledy-piggledy with empty whiskey bottles.” This was a deﬁning moment for the new institution. Archivist Ferriero said NARA
37 Thomson West, U.S. GAO Federal Legislative Histories on Westlaw® (FED-LH), February 19, 2008.
38 NARA, Report on Alternative Models for Presidential Libraries, Mandated by the Presidential Historical Records Preservation Act of 2008 (PL 110-404), September 254, 2009.
39 Donald R. McCoy, National Archives, op. cit., p. 106.
faces a similar deﬁning moment, with “vermin and insects replaced by a variety of software packages, platforms, and old technologies.”40
It is always difﬁcult to reconc