6.2 Ingest

6.2.1 Submission Information Package (SIP)

6.2.1.1 The SIP is an Information Package that is delivered to the repository and digital storage system for ingest. The SIP includes the audio data to be stored and all the necessary related metadata about the object and its content. Ingest, in the OAIS model, is the process that accepts the content and all its related metadata (SIP), verifies the file, extracts the relevant data and prepares the AIP for storage, and ensures that AIPs and their supporting Descriptive Information become established within the OAIS.

6.2.1.2 A digital repository and preservation system should be able to accept and validate an audio file. Validation is a process that ensures that the files which are being accepted into the digital storage system comply with the standards. Non standard files may become difficult to use in the future when current replay systems no longer exist. Tools exist for automated validation of file formats, and some open source solutions, like JHOVE (JSTOR/Harvard Object Validation Environment), are available and being further developed.

6.2.2 Format

6.2.2.1  IASA recommends the use of .wav or preferably BWF .wav files [EBU tech 3285]. The difference between the two is that the BWF contains a set of headers which can be used to organise and manage metadata. Though BWF metadata is adequate for many purposes, in some sophisticated systems and exchange situations a more comprehensive package is required, and in these circumstances Metadata Encoding and Transmission Standard (METS) is often used. The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using XML (eXtensible Markup Language). A METS package, which consists of metadata and content, is often used as an exchange standard between digital archives.

6.2.2.2  Material eXchange Format (MXF) is a container format for professional digital video and audio media defined by a set of SMPTE standards. MXF has been mostly taken up by the video archiving community, though it is capable of managing audio. Like METS, it is primarily a set of metadata which “wraps” the content, in this case, audio. Both these are very useful formats in the exchange and management of content and information between archives and repositories.

6.2.2.3  The format of the SIP will depend on the system and the size and sophistication of the enterprise. It is quite possible to establish a viable archive using .wav files and manually entering most of the necessary metadata into the system by hand, and acquiring the necessary technical metadata at the ingest stage. This however, would only be appropriate for the smallest of collections. Large collections with remote and separate digitisation processes and large quantities of material must build sophisticated ingest and data exchange systems to ensure the content is adequately ingested into the data storage systems. Production and verification software generates much of this data as standardised XML-files that may be used for preservation purposes. The National Library of New Zealand Metadata Extractor tool, for example, is a Java-based tool that extracts preservation metadata from digital objects and outputs that metadata in a standard format (XML).

6.2.3 Preservation Metadata

6.2.3.1 The metadata needed to manage preservation processes at the ingest stage is all the information regarding the creation of the digital audio object and the changes to format that have occurred prior to ingest. In this way the technical provenance of the object is preserved, which allows a pathway between the present form of the item and original from which it was created to be traced.

6.2.3.2 BWF has a non-compulsory recommendation for BWF entitled “Format for CodingHistory field in Broadcast Wave Format” http://www.ebu.ch/CMSimages/en/tec_text_r98-1999_tcm6-4709.pdf which describes how changes to the file may be described. Local usage of the ASCII free text field allows the description of the technical equipment or software that was used in the creation of the digital audio object.