Klaus-Dieter Lehman, "Making the transitory permanent: the intellectual heritage in a digitized world of knowledge," Daedalus, Fall 1996 v125 n4 p307(23). Author or Rep: summarized by Thom Shepard 4/14/98
Message type: VIEWS Subject: Annotated Bibliography Web Date:
Mr. Lehman, National Librarian of Germany, points out that academia is gradually accepting the digital domain as a platform for scholarly publication and that the Internet is an increasingly viable "alternative academic medium." This trend will impact upon the way we will acquire scholarly information, providing a new "quality of access." It will also dramatically change the "quantitative relationship" between print and digital media. While the amount of academic information "doubles every ten to fifteen years," the costs of print publication have increased, in part because demand for these materials has decreased. Although scholars might welcome the acceptance of digital sources of publication, Lehman points out the inherent dangers of regarding digitization "as a panacea for all of the real and imagined problems libraries now face in connection with the preservation of physical collections: the growing need for storage space, the deterioration of books due to acid paper, and the rising costs of library operation." [Though Lehman doesn't mention this, the digital domain has equivalents of each of these problems: digital storage for uncompressed media, deterioration of the storage vehicle; costs of upgrading hardware and software. -TS] Lehman warns of adopting global solutions, urging institutions to develop "strategies focused upon specific rather than generalized solutions - strategies that ensure books their proper place." Lehman then focuses on the transitory nature of this digital "new knowledge." He states that digital information is indestructable only in theory. In reality, digital material fails to meet the criteria for long-term availability: Digital storage media lasts less than a human lifetime; at best, CDs might survive 50 years before data is lost. Even if there were materials developed to last centuries, changes in coding and formats, writes Lehman, would require format or "structure changes every ten to twenty years." In addition, changes in software, operating systems, and hardware render "existing collections of information obsolete." Complicating this situations are "marketing strategies...to exclude both forward and retroactive compatibility by means of inaccessible core components of operating systems, hidden objects in software programs, or protective mechanisms in processors."
Lehman urges deeper cooroperation between libraries and publishers and recommends specific measures that might help promote the mutual interest of long-term preservation of digital collections: *storing digital publications with supplementary materials, documentation and manuals *submitting proprietary hardware and software required to access a given publication *granting libraries the right to reproduce publication for preservation purposes
[Does he mean the software application,too? - TS]
Lehman recognizes that physically prolonging the life of digital materials is not enough. Reliable accessibility over time is crucial to digital preservation. He recommends implementing a "central server concept using a rapid local network connected via open communications interfaces (Z39.50) to the wide area network.
Digital materials, he writes, should be "transferred [...] to a homogeneous storage system" that would meet the following requirements: 1) insure long-term preservation in conformity with state-of-the-art technology; 2) provide as much forward compatibility as possible in anticipation of future developments; 3) provide publications for use, regardless of the specific properties of the original data medium; 4) provide quality-control procedures for reproduction, conversion, and migration.
Lehman contrasts a system of migration strategies as proposed by the Commission on Preservation and Access and the Research Libraries Group and the emulation alternative as discussed by Jeff Rothenberg in his Scientific American article.
The CPA/RLG study recommended "measures [...] to transfer digital materials periodically from one hardware or software configuration to another, or from one computer generation to the next." Working with publishers, the task group hopes to establish recommended practices that begin with the very production of digital materials, including "appropriate prerequites for [...] technical platforms, compression mechanisms, and coding methods."
Rothenberg recommends using "systems capable of simulating obsolete hardware and software environments in new system environments." One problem that Lehman foresees with the emulation approach involves the growing popularity of Java applets. "If Java becomes the Internet standard," he writes, "the software structure will produce not only distributed documents but distributed programs as well."
Lehman concludes this section of his paper with the warning that the battle for digital preservation is far from won, and that archivists and librarians will continue to take up arms:
Even if we can imagine that standards and migration processes will one day enable us to build a more uniform structure for digital collections - in the form of object-oriented or relational data bases, for instance - we will surely have to deal with nonstandard formats for some time to come. In practical terms this means that, regardless of our collection guidelines, the digital deposit library will be unable to ignore the question of cost when choosing techniques and technologies for long-term preservation.
In his final section, Lehman discusses the responsibility of bibliographic control. Here, he calls upon library science professionals to maintain the traditional practices of "processing and cataloging procedures," without which "collections lack order, identifiability, and a comprehensible system of access." He urges that the standards used in cataloging printed publications, specifically the Anglo-American Cataloging Rules (AACR) be adopted or adapted for digital materials. Despite some bare level "auto indexing" features of some popular Web search engines, a full bibliographic index for electronic documents would:
establish a clear correlation between bibliographic information and an authentic publication. [As a result,] Publications are documented in keeping with copyright laws and protected against textual manipulation; comprehensible, standardized modes of access are established, and the status of networked publications as sources of scholarly value is enhanced.
Anticipating the Dublin Core initatives, he proposes the placement of bibliographic data within a hypertext markup language (HTML) or in defined bibliographic fields that might be mapped to "appropriate data formats," such as MARC. Additional fields might be designed specifically for the source material's digital qualities, generated through, for example:
the opening screen, metadata from the publication, supplemental material in electronic form, and other supplementary material (information included in the product package).
Specific to archives, information would include access data or location number; a standard structured descriptive information or metadata; archive data, which:
describes the different states in which the document appears as it passes through the migration process, including the document in its original format, the current status of the publication as provided for use, and the form in which the document is archived.
In conclusion, Lehman stresses that the bibliographic processing of digital publications needs to be a national effort, which would both document national intellectual production" and provide "a mechanism for linking bibliographic description, including metada, with the original digital publication."
we simply cannot ignore our obligation to pass on cultural experiences and insights, nor can we be content to plan only one day at a time. Making the transitory permanent and turning publications of the moment into publications for the ages are important and worthy goals for libraries.
Roy Tennant, "The grand challenges," Library Journal, Dec 1997 v122 n20 p31(2). Author or Rep: summarized by Thom Shepard 4/14/98
Message type: VIEWS Subject: Annotated Bibliography Web Date:
Tennant proposes that standards might be used to achieve interoperability among digital libraries, just as machine-reliable cataloging established the metadata standard for the interchange of bibliographic data. He likens interoperability to a Holy Grail, and insists that this and smooth migration will be unobtainable without standards.
He also emphasizes that metadata as a concept is as old as cataloging:
Metadata is the information required to manage an item, organize it among other items, make it retrievable, and in some cases make it navigable.
Metadata standards are required for recording both more and less information than MARC cataloging was designed to do. Collections might be treated as a single entity, while a single digital object might require information about its bit depth, scanning resolution, and file format, as well as "information that allows us to create navigational systems."
Metadata standards for digital materials, writes Tennant, should also contend with "rights management" and the issue of "fair use."
In terms of preservation, metadata standards are especially needed in the absence of digital format standards and the present focus on migration as a strategy. Information, Tennant tell us, must be "rescued" as it moves from one technology to another, especially when the newer technologies require functionally different software.
Deanna B. Marcum, "The preservation of digital information," Journal of Academic Librarianship, November 1996 v22 n6 p451-454. Author or Rep: summarized by Thom Shepard 4/14/98
Message type: VIEWS Subject: Annotated Bibliography Web Date:
Marcum, in a paper based on a presentation to the Coalition for Networked Information, looks at the challenge of making digital information part of the "information landscape" by way of examining the Draft Report of the Task Force on Archiving of Digital Information. Problems of technology may be resolved in the not too distant future, but the social problems, which include property rights and the responsibility for accessing materials through generations. If solutions are not found, "the legacy of digital information will be enormously more complex than print, and the problems of preserving print materials [...] will seem trivial by comparison." [from Charles B Lowry's preface]
Marcum begins her essay with some background material on the Commission on Preservation and Access, which was formed in 1968 to develop strategies to save brittle books from oblivion through microfilming. Though the library communities were far from united in recommending microfilm as a rescue method, the "Commission determined that microfilming would be pursued as the immediate method [...], but would continue to investigate other more technologically advanced solutions." One chief objection to microfilm was inherent problems of accessibility, while accessibility is one of the chief strengths of digital technologies.
In more recent years, as the Commission began to experiment with digital technologies, a nagging question remains: "Can we be assured that information stored digitally will be accessible in the future?"
Rapid changes in the means of recording information, in the formats for storage, and in the software for use threaten to shorten the life of information in the digital age to a few years.
Digital information, Marcum points out, is dependent upon hardware and software that "may disappear from the market within a few years time," which requires libraries to constantly upgrade:
Libraries and archives cannot save the old machines when spare parts are no longer available, and when the staff no longer knows how to use the old software, it is unceremoniously discarded.
Complicating the lifespan of digital information is the phenomenon called the World Wide Web, where "anyone with a computer [can] become an author," giving rise to "a flood of uncontrolled digital information that may or should be preserved by libraries and archives." Preserving this information "assumes enormous financial, technical, legal, and organizational burdens for our cultural institution of record."
Marcum places the "fundamental responsibility" of preserving this material primarily upon the shoulders of librarians and archivists, and points out that the Commission on Preservation and Access and the Research Libraries Group have been leaders in initiating dialog across extended domains that include curators, technologists, relevant government and private sector organizations.
The Task Force defines digital archives as "repositories of digital information that are collectively responsible for storing and ensuring, through various migration strategies, the long-term accessibility of the nation's social, economic, cultural, and intellectual heritage in digital form." This definition contrasts with that of digital libraries, which is defined by the Task Force as "repositories that collect and provide access to digital information, but may or may not choose to provide for the long-term storage and access of that information."
Two mechanisms that will hold together a national archival system are certified archival institutions and a critical fail-safe mechanism. Certification would be established through standards and criteria of an independently administered program. Once a repository of digital information earns certification, it would have access to a critical fail-safe mechanism to "exercise an aggressive rescue function to save digital information that it judges to be culturally significant."
Without the operation of a formal certification program and a fail-safe mechanism, [digital] preservation will likely be overly dependent on marketplace forces, which may value information for too short a period and without applying broader, public intrest criteria. [Draft Task Force Report, p6.]
Refreshing, or the copying of digital information from medium to medium, is a technique used by archivists to help preserve data. The Task Force, however, points out that this practice succeeds only to the extent that
the information is encoded in a format that is independent of the particular hardware and software needed to use it and as long as there exists some kind of software to manipulate the format in curent use. Otherwise, copying depends either upon the compatibility of present and past versions of software and generations of hardware or the ability of competing hardware and software product lines to interoperate. [Preserving Digital Information: Draft Report. August 24, 1995. p3.]
A more thorough approach to preserving digital publications is migration. Citing Owen and van de Walle's paper, "Issues Faced by National Libraries in the Field of Deposit Collections of Electronic Publications," Marcum includes the following levels of strategies for migrating digital materials:
Medium refreshing: copy between same physical carriers Medium conversion: transfer to more stable or standard medium Format conversion: converting data formats, usually to reduce the number of formats that a library would have to handle. Migration of technical environment: converting publications to operate in a new hardware or software platform. Emulation of technical environment: simulating a "previous, now obsolete environment" to enable access to digital publications through new systems.
Responsibility for archiving rests fundamentally with the creator or owner of the information and that digital archives may invoke the fail-safe mechanism to protect culturally valuable information.
Marcum outlines the Task Force's recommendations: 1. Solicit proposals from interested archives and provide services to place "information objects" into "trust for use to future generations." 2. Secure funding for proposals to advance digital archives. 3. Foster practical experiments to facilitate the "preservation of the cultural record in digital form." 4. Coordinate the appropriate organizations and individuals in the development of standards, criteria and mechanisms for identifying and certifying repositories of digital information as archives. 5. Work toward establishing national policies for the longevity of digital information. 6. Support development of white papers to assist in establishing an effective fail-safe mechanism. 7. Set up forums for professionals across various disciplines to generate thinking on creating and financing archives for specific domains. 8. Commission case studies to identify best practices and benchmarking costs for storage, metadata, and migration.
Marcum concludes her paper with a quote by Donald Waters, addressed to the Association of Research Libraries, part of which reads as follows:
We are faced with what the Task Force report calls a "grander problem of organizing ourselves over time and as a society to maneuver effectively in a digital landscape." To meet the cultural and economic imperatives of digital preservation requires us to build, almost from scratch, a system of infrastructure for moving the record of knowledge naturally and confidently into the future. [Donald J. Waters, Realizing Benefits from Inter-Institutional Agreements: The Implications of the Draft Report of the Task Force on Archiving of Digital Information. Presentation to the Association of Research Libraries, Oct. 19, 1995.]
Lynne M. Hayman, "Database design for preservation project management: the California newspaper project," Library Resources and Technical Services, July 1997 v41 n3 p236-253. Author or Rep: summarized by Thom Shepard 4/14/98
Message type: VIEWS Subject: Annotated Bibliography Web Date:
Hayman describes the database design for the California Newspaper Project which gathers materials from various repositories for the purpose of microfilming. The system uses the MARC bibliographic standard and imports records in OCLC format. This article is relevant to UPF because of its discussion of what constitutes a good database design for preservation applications. If we consider the UPF as a file format, we might consider plugging into these features.
The choice of a system best suited to management and manipulation of bibliographic and preservation data was considered essential to the project's success, and the desired functionality was comprehensive.
Requirements included the ability to generate lists of repositories and collections sorted by zip code, city, county, or region; to produce cataloging work forms that could be upgraded; to produce management reports to identify candidate titles for preservation microfilming and trace work throughout the preservation process. In addition, the system had to facilitate the import of MARC and non-MARC records and "allow record overlay in either direction."
Maria Troy, "Video preservation: a report from the trenches," Afterimage, Sep-Oct 1996 v23 n2 p4(2) Author or Rep: summarized by Thom Shepard 4/14/98
Message type: VIEWS Subject: Annotated Bibliography Web Date:
Although there are no official guidelines for the preservation of video, its basic principles are transfers, cleaning, reference copies, two 'masters', storage and cataloging. Following these measures is financially hard on many institutions. There must, therefore, be a movement to put pressure on video tape manufacturers to make their products more durable and to warn users about the inherent problems of video as a storage medium.
For video collectors, according to Troy, preservation means re-mastering or the "physical refurbishing of tapes as well as the ethical and aesthetic decisions about what images will become part of history." Methods employed include the following:
Troy urges "media independents" to consult with "professional conservators, librarians and archivists" to develop better strategies for prolonging the life of their video collections, and cites the "Playback 1996" conference as an example of this kind of activity.
The next section of Troy's essay deals with "almost schizophrenic" nature of video preservation. Factors affecting video performance over time are almost limitless. These include the quality of the orginal production or video signal, which may be attributed to how many times the tape was reused or even its chemical compoosition. The manner or conditions in which a video was stored is another major factor. A third factor is how modern playback equipment processes these older video signals. For example, Troy reports that "older decks have different skew and time-code allowances, and can compensate better for old, slightly warped tape."
Videotape is a material dependent upon technology for access to its contents. Having an archive of tapes that are unplayable is as useful as a library of books that won't open.
The development of preservation standards can help build confidence and awareness among funders, but standards can be used to cut the other way as well, excluding a large number of small to mid-sized organizations and their prized collections.
"Some collections could [...] be converted into working image banks." "Distribution income from tapes that have obvious market appeal can fund the preservation of significant tapes that are not so easily accessible."
To reduce the cost and uncertainty of preservation, groups might try to influence manufacturers to produce longer-lasting videotape and to release details on the composition of their tapes. "If the consumer market was sensitized to the problem of videotape degeneration, it could exert major pressure on the industry."
Jeff Rothenberg, "Ensuring the longevity of digital documents," Scientific American, January 1995, v272 n1 p42-47. Author or Rep: summarized by Thom Shepard 4/14/98
Message type: VIEWS Subject: Annotated Bibliography Web Date:
Rothenberg begins his article with an imagined trip to fifty years into the future, in which he envisions his grandchildren stumbling across a CD-ROM. How do they read or interpret the files, even if they have the right equipment to read the disk?
...because of changing hardware and software, only the letter will be immediately intelligible 50 years from now.
Rothenberg's premise is that digital records are more fragile than paper records, a fasct that has been illustrated by the near-loss of the data for the 1960 U.S. Census and other government projects.
The content and historical value of thousands of records, databases and personal documents may be irretrievably lost to future generations if we do not take steps to preserve them now.
Only in theory is digital information invulnerable. Besides its vulnerability to physical degradation, digital information is subject to technical obsolescence. Rothenberg then explains some of the fundamentals of digital storage, starting with the concept, bit stream, defined as "an intended, meaningful sequence of bits [binary digits or 0's and 1's], with no intervening spaces, punctuation or formatting."
A file is not a document in its own right -- it merely describes a document that comes into existence when the file is interpreted by the program that produced it.
The meaning of a file is not inherent in the bits themselves, any more than the meaning of this sentence is inherent in its words.
Documents such as multimedia presentation are impossible to read without appropriate software: unlike printed words, they cannot just be "held up to the light."
If I include a copy of the program on the CD, they must still find the operating system software that allows the program to run on some computers.
What kind of digital Rosetta Stone can I leave to provide the key to understanding the contents of my disk?
Rothenberg says that to save digital documents, we must first copy their bit streams onto more readily accessible media. He make analogies to preserving ancient writings, weighing the need to save these writings in their native language. Rothenberg suggests that translation results in some loss of information, which may not be immediately determined. On the other hand, if one saves the writings in their original language, "this approach assumes that knowledge of the original language is retained."
There are two strategies for preserving digital documents: 1. translate digital documents into standard forms that are independent of any computer operating system; for example, relational databases can in principle be translated into standard tabular form acceptable to any other system. There are two flaws to this method, says Rothenberg: (1) Relational databases are not as standard as they may appear to be, because commercial systems are constantly marketing add-on values. A reliance on "short-term standards" generates a false of security, and any attempt to provide translations results in further losses. (2) There are new forms of documents which may be fundamentally different form what went before; for example, relational databases are moving toward object-oriented databases.
2. extend the longevity of computer systems and original software to keep documents readable; Rothenberg says that one might not have to own the exact software that produced the document:
If we could describe its behavior in a way that does not depend on any particular computer system, future generations could re-create the behavior of the software and thereby read the document.
But in the very next passage, Rothenberg argues against this idea, asserting that technology is not yet sophisticated enough to mimic software behavior. The only feasible solution, he says, is to save the actual applications, both the programs and the operating system software, or have software engineers write programs called "emulators," which mimic hardware functions. For emulators to work,
specifications must be saved in a digital form independent of any particular software, to prevent having to emulate one system to read the specifications needed to emulate another.
If digital documents and their programs are to be saved, their migration must not modify their bit streams... If such changes are unavoidable, they must be reversible without loss. Moreover, one must record enough detail about each transformation to allow reconstruction of the original encoding of the bit stream.
Rothenberg anticipates the UPF in several ways, none more so than in the container concept, which Rothenberg calls "virtual envelopes," in which:
contents would be preserved verbatim, and contextual information associated with each envelope would describe those contents and their transformation history. This information must itself be stored digitally (to ensure its survival), but it must be encoded in a form that humans can read more simply than they can the bit stream itself, so that it can serve as a bootstrap. Therefore, we must adopt bootstrap standards for encoding contextual information; a simple, text-only standard would suffice. Whenever a bit stream is copied to new media, its associated context may be translated into an updated bootstrap standard. [...] These standards can also be used to encode the hardware specifications needed to construct emulators.
Archivists Warn: Don't Depend on Digital Tape Author or Rep: By Frank Beacham 8/13/97
Message type: VIEWS Subject: Storage Web Date:
"New York: A group of the world's leading audio preservationists have warned that tape-based digital recording media -- especially DAT -- is not reliable for long term archiving.
"During a panel discussion at AES '95, the archivists issued an urgent plea to recording equipment manufacturers to quickly create reliable long-term storage media for analog and digital audio recordings. There was a consensus that, at least for now, analog tape is the only proven, reliable way to preserve sound recordings for the future."
Review of Metadata Formats Author or Rep: Rachel Heery 5/23/97
Message type: VIEWS Subject: Metadata Web Date: October 1996
"This paper intends to review a number of metadata formats in order to highlight their characteristics. The comparison will be done in the context of the requirements of bibliographic control, with reference to the suitability of the various record formats for this purpose. The author is a researcher working at UKOLN on the ROADS (Resource Organisation and Discovery in Subject-based services) project Ref 1, part of the eLib Electronic Libraries Programme, and a special concern is to establish a comparative context in which to discuss the IAFA template which is being used in that project. The choice of formats for comparison has been limited due to practical considerations. The formats chosen for consideration (MARC, IAFA templates, TEI headers and URCs) were chosen for their particular relevance to those working within the UK eLib projects. Other formats such as GILS (US Government Information Locator Service) and Harvest SOIF (Summary Object Interchange Format) would also merit investigation in the future."
Conversion of Traditional Source Materials into Digital Form Author or Rep: Anne R. Kenney 5/23/97
Message type: VIEWS Subject: Digital Archives Web Date: October 13,1995
From the Introduction:
"This paper will focus on the electronic conversion of traditional source materials, including books, journals, manuscripts, graphic materials, and photographs that serve as the primary documentation for arts and humanities research. While acknowledging other means for electronic conversion, the main emphasis will be on the use of imaging technology to produce digital surrogates for paper- and film-based sources."
Preserving Digital Objects: Recurrent Needs and Challenges Author or Rep: Michael Lesk 5/23/97
Message type: VIEWS Subject: Digital Archives Web Date: 1995
Abstract
We do not know today what Mozart sounded like on the keyboard, nor how David Garrick performed as an actor, nor what Daniel Webster's oratory sounded like. What will future generations know of our history? We thought that when printing was discovered, and libraries were created, we would no longer have disasters such as the loss of all but 7 plays from the 80 or more that Aeschylus wrote. Then acid process wood pulp paper, used in most books since about 1850, again threatened cultural memory loss. But digital technology seemed to come to the rescue, allowing indefinite storage without loss. Now we find that digital information too, has its dark side, and although it can be kept without loss it can not be kept without cost.
Keeping digital objects means copying, standards, and legal challenges. This is a process, not a single step. Libraries have to think of digital collection maintenance as an ongoing task. It is one that gets steadily easier per bit; last generation's difficult copying problem is now easy. However, the rise of more complex formats and much bulkier information mean that the total amount of work continues to increase. Our hope is that cooperation between libraries can reduce the work that each one has to do.
Message type: VIEWS Subject: IMA Web Date: Last modified on 01/07/97
Scope
This recommended practice specifies a portable, content-neutral, and platform-neutral container format for data exchange based on Apple Computer's Bento container format, also known as the OpenDoc(r) Standard Interchange Format. All data objects for exchange are stored in one or more Bento containers or in external files referenced by Bento containers. Although a Bento container can use many forms of storage, such as blocks of memory and environment clipboards, in this recommended practice, a Bento container is a file for data exchange on a storage medium, such as a disk or a CD-ROM.
Mass Storage based Solutions for Digital Media Archives Author or Rep: Joachim Stark, IBM Germany 5/23/97
Message type: VIEWS Subject: Digital Archives Web Date: September 1994
Presented at the IASA / FIAT Conference,
"Introduction:
During my presentation I want to focus on an overall end-to-end solution view for Digital Media Archives. As illustrated during this technical module, there is a variety of scenarios for audio and video archives to go digital. Hence there is no unique solution, that can be applied to all of these scenarios, but still there are some generic components, that will be found in all such solutions.
In this presentation I will try to give an overview about:
the generic solution structure a functional view of the solution components a technical outlook with consideration of scalability and flexibility an exemplary demo scenario"