The UPF specifies that machine-independent algorythms be encapsulated within
the stored media. This "Rosetta stone," inscribed with maps for reassembling
the digital information archived on the storage media, would serve as a
universal translator.
Should the UPF require that the source code used to read data be
encapsulated in the storage format? Or should a Recommended Practice allow
the option of storing the Rosetta stone as a separate file that might be executable
code but would be required to operate on a set of specified platforms?
Should the UPF recommend a list of specific file formats that would be defined
by the Rosetta stone?
Related to the Rosetta stone is the issue of platform independence. In respect to a digital archives, how important is platform independence to the concept of a digital preservation format? Is this a "wait and see" issue, or should the UPF make a definitive statement on the requirement of platform-independence?
Several initiatives or digital projects include the concept of a universal identifier, which is analogous to the relational database concept of a unique identifier or an Internet URL. The UPF takes for granted the requirement that each object within a digital archive contains a unique identifier. Key question remain: how should this identifer be used for archives? Should there be a central registration body for these identifications? At what level should material be tagged? Or should there be a standard code representing multiple cataloged levels, followed by more specific or individual codes? Should the unique identifiers be allocated to a standardized area of storage, such as a specific part of the storage media header or TOC?
In our first user survey, reliance on a technical staff and expense of maintaining one were specifically mentioned as major concerns. Tight budgets may require that new staff have skill sets that may stretch across domains. What is the solution for organizations with limited human resources to maintain a wide range of skills? What are the implications of technical expertise upon a universal preservation format, or upon any techincal standard adopted for digital materials? What would be the process for "updating" the Rosetta stone? Another way of asking this question: Should a Recommended Practice advocate technological "Ease-of-Use"?
As we presented our initiative to groups concerned with preserving analog media, we grew aware of a crippling gap between the language used by archivists and the language adopted by digital communities. Words casually chosen by digital initiatives often have deep analog roots. As a result, we wish to preface any Recommended Practice with a glossary of terms that might help bridge the analog-digital domains. Terminology might include the following:
Access Indexing Header Information Storage Tape Migration Data Tape Preservation Platform Format Operating System File Algorythm Data Stream Program Metadata Information vs Data Database Query Language Source codeWhat other terms should we include, and what should be the basis or sourcebook for defining them?
Because this initative has become a SMPTE study group, some professionals working primarily with electronic records feel that the UPF is strictly an initiative dealing with the archiving of moving images. We try to stress that the adoption of a UPF framework would benefit all types of digital information. Do you think that electronic records are inherently different from multimedia in terms of a storage format or framework?
One of the points we try to emphasize in our presentations to archivist groups is that the UPF would help carry on the traditional practices of archivists and librarians by providing a robust, easily expandable framework for digital materials. For example, by incorporating such concepts as unique identifiers and by "gluing" certains kinds of cataloging information to the stored media, we hoped to perpetuate the two principles that are the foundation of standard archival practice: Provenance and Original File Order. We also realize that many variations of these practices have evolved, and that no two archives will follow the same guidelines. So the question here relates to the natural migration of practices: are archiving practices standardized enough to be used as a model or metaphor for designing digital archiving practices? Or is it permissible to explore entirely new methods?
Storage technology evolves at a dizzying pace.
Recently, for example, we have looked
at PaperDisk from Cobblestone Software,
technology that prints digital information on plain paper through a common
laser or inkjet printer, then reads it back into a computer through a standard
flatbed or hand-held scanner. The potential for this type of technology is
enormous. Paper lasts a long time, but imagine using this technology with
material that has an even longer lifespan. We have also heard about using DVD
as an archival storage medium. Perhaps most exciting is the technology called
"HD-ROM," developed by Los Alamos National Laboratory and licensed and sold by
Norsam Technologies, which holds "650 GB, 47,000 images on a 2-inch disk."
With planned enhancements, the Norsam HD-ROM may permit up to
12 terabytes of storage per 4 3/4-inch disk.
http://www.entmag.com/archive/1997/may07/050705.html-ssi
That stated, do you feel that the available technology is adequate to address
the problem of digital storage? Or might we adopt a straw dog strategy to
illustrate how the components of a UPF might be employed over various storage
technologies? Also, can the same standard be applied to multiple storage types
on a single media as to simple tape to tape storage?
When the problem of physical storage is resolved, does compression cease to be
an issue for archival storage? In other words, if a physical storage media can
be virtually limitless, then why not store everything in uncompressed format?
Or are there other issues to consider, such as speed of retrieval?