3.9 Descriptive Metadata – Application Profiles, Dublin Core (DC)

3.9.1 Much of the effort devoted to metadata in the heritage sector has focussed on descriptive metadata as an offshoot of traditional cataloguing. However, it is clear that too much attention in this area (e.g. localised refinements of descriptive tags and controlled vocabularies) at the expense of other considerations described above will result in system shortcomings overall. Figure 4 demonstrates the various inter-dependencies that need to be in place, descriptive metadata tags being just one sub-set of all the elements in play.

sample descriptive metadata

Fig 4: simple descriptive metadata (courtesy Dempsey, CLIR/DLF primer, 2005)

3.9.2 Interoperability must be a key component of any metadata strategy: elaborate systems devised independently for one archival repository by a dedicated team will be a recipe for low productivity, high costs and minimal impact. The result will be a metadata cottage industry incapable of expansion. Descriptive metadata is indeed a classic case of Richard Gabriel’s maxim ‘Worse is better’. Comparing two programme languages, one elegant but complex, the other awkward but simple, Gabriel predicted, correctly, that the language that was simpler would spread faster, and as a result, more people would come to care about improving the simple language than improving the complex one. This is demonstrated by the widespread adoption and success of Dublin Core (DC), initially regarded as an unlikely solution by the professionals on account of its rigorous simplicity.

3.9.3 The mission of DCMI (DC Metadata Initiative) has been to make it easier to find resources using the Internet through developing metadata standards for discovery across domains, defining frameworks for the interoperation of metadata sets and facilitating the development of community- or discipline-specific metadata sets that are consistent with these aims. It is a vocabulary of just fifteen elements for use in resource description and provides economically for all three categories of metadata. None of the elements is mandatory: all are repeatable, although implementers may specify otherwise in application profiles – see section 3.9.8 below. The name “Dublin” is due to its origin at a 1995 invitational workshop in Dublin, Ohio;”core” because its elements are broad and generic, usable for describing a wide range of resources. DC has been in widespread use for more than a decade and the fifteen element descriptions have been formally endorsed in the following standards: ISO Standard 15836-2003 of February 2003 [ISO15836 http://dublincore.org/documents/dces/#ISO15836 ] NISO Standard Z39.85-2007 of May 2007 [NISOZ3985 http://dublincore.org/documents/dces/#NISOZ3985 ] and IETF RFC 5013 of August 2007 [RFC5013 http://dublincore.org/documents/dces/#RFC5013 ].

Table 1 (below) lists the fifteen DC elements with their (shortened) official definitions and suggested interpretations for audiovisual contexts.

DC element DC definition Audiovisual interpretation
Title A name given to the resource The main title associated with the recording
Subject The topic of the resource Main topics covered
Description An account of the resource Explanatory notes, interview summaries, descriptions of environmental or cultural context, list of contents
Creator An entity primarily responsible for making the resource Not authors or composers of the recorded works but the name of the archive
Publisher An entity responsible for making the resource available Not the publisher of the original document that has been digitized. Typically the publisher will be the same as the Creator
Contributor An entity responsible for making contributions to the resource Any named person or sound source.Will need suitable qualifier, such as role (e.g. performer, recordist)
Date A point or period of time associated with an event in the lifecycle of the resource Not the recording or (P) date of the original but a date relating to the resource itself
Type The nature or genre of the resource The domain of the resource, not the genre of the music. So Sound, not Jazz
Format The file format, physical medium, or dimensions of the resource The file format, not the original physical carrier
Identifier An unambiguous reference to the resource within a given context Likely to be the URI of the audio file
Source A related resource from which the described resource is derived A reference to a resource from which the present resource is derived
Language A language of the resource A language of the resource
Relation A related resource Reference to related objects
Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant What the recording exemplifies, e.g. a cultural feature such as traditional songs or a dialect
Rights Information about rights held in and over the resource Information about rights held in and over the resoure

Table 1: The DC 15 elements

 

3.9.4 The elements of DC have been expanded to include further properties. These are referred to as DC Terms. A number of these additional elements (‘terms’) will be useful for describing time-based media:

DC Term DC definition Audiovisual interpretation
Alternative Any form of the title used as a substitute or alternative to the formal title of the resource An alternative title, e.g. a translated title, a pseudonym, an alternative ordering of elements in a generic title
Extent The size or duration of the resource File size and duration
extentOriginal The physical or digital manifestation of the resource The size or duration of the original source recording(s)
Spatial Spatial characteristics of the intellectual content of the resource Recording location, including topographical co-ordinates to support map interfaces
Temporal Temporal characteristics of the intellectual content of the resource Occasion on which recording was made
Created Date of creation of the resource Recording date and any other significant date in the lifecycle of the recording

Table 2: DC Terms (a selection)

 

3.9.5 Implementers of DC may choose to use the fifteen elements either in their legacy dc: variant (e.g., http://purl.org/dc/elements/1.1/creator) or in the dcterms: variant (e.g., http://purl.org/dc/terms/creator) depending on application requirements. Over time, however, and especially if RDF is part of the metadata strategy, implementers are expected (and encouraged by DCMI) to use the semantically more precise dcterms: properties, as they more fully comply with best practice for machine-processable metadata.

3.9.6 Even in this expanded form, DC may lack the fine granularity required in a specialised audiovisual archive. The Contributor element, for example, will typically need to mention the role of the Contributor in the recording to avoid, for instance, confusing performers with composers or actors with dramatists. A list of common roles (or ‘relators’) for human agents has been devised (MARC relators) by the Library of Congress. Here are two examples of how they can be implemented.

<dcterms:contributor>
<marcrel:CMP>Beethoven, Ludwig van, 1770-1827</marcrel:CMP>
<marcrel:PRF>Quatuor Pascal</marcrel:PRF>
</dcterms:contributor>

<dcterms:contributor>
<marcrel:SPK>Greer, Germaine, 1939- (female)</marcrel:SPK>
<marcrel:SPK>McCulloch, Joseph, 1908-1990 (male)</marcrel:SPK>
</dcterms:contributor>

The first example tags ‘Beethoven’ as the composer (CMP) and ‘Quatuor Pascal’ as the performer (PRF). The second tags both contributors, Greer and McCulloch, as speakers (SPK) though does not go as far as determining who is the interviewer and who is the interviewee. That information would need to be conveyed elsewhere in the metadata, e.g. in Description or Title.

3.9.7 In this respect, other schema may be preferable, or could be included as additional extension schema (as illustrated in Fig. 2). MODS (Metadata Object Description Schema http://www.loc.gov/standards/mods/), for instance allows for more granularity in names and linkage with authority files, a reflection of its derivation from the MARC standard:

name
 Subelements:
   namePart
     Attribute: type (date, family, given, termsOfAddress)
   displayForm
   affiliation
   role
      roleTerm
         Attributes: type (code, text); authority
         (see: http://www.loc.gov/standards/sourcelist/)
      description
   Attributes: ID; xlink; lang; xml:lang; script; transliteration
   type (enumerated: personal, corporate, conference)
authority (see: http://www.loc.gov/standards/sourcelist/)

3.9.8 Using METS it would be admissible to include more than one set of descriptive metadata suited to different purposes, for example a Dublin Core set (for OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) compliance) and a more sophisticated MODS set for compliance with other initiatives, particularly exchange of records with MARC encoded systems. This ability to incorporate other standard approaches is one of the advantages of METS.

3.9.9 DC, under the governance of the Dublin Core Metadata Initiative (DCMI), continues to develop. On the one hand its value for networking resources is strengthened through closer association with semantic web tools such as RDF (see Nilsson et al, DCMI 2008) while on the other it aims to increase its relevance to the heritage sector through a formal association with RDA (Resource Description &Access http://www.collectionscanada.gc.ca/jsc/rda.html) due to be released in 2009. As RDA is seen as a timely successor to the Anglo America Cataloguing Rules this particular development may have major strategic implications for audiovisual archives that are part of national and university libraries. For broadcasting archives other developments based on DCMI are noteworthy At the time of writing the EBU (European Broadcast Union) is completing the development of the EBU Core Metadata Set, which is based on and compatible with Dublin Core.

3.9.10 The archive may wish to modify (expand, adapt) the core element set. Such modified sets, drawing on one or more existing namespace schemas (e.g. MODS and/or IEEE LOM as well as DC) are known as application profiles. All elements in an application profile are drawn from elsewhere, from distinct namespace schemas. If implementers wish to create ‘new’ elements that are not schematized elsewhere, for instance contributor roles unavailable in the MARC relators set (e.g. non-human agents such as species, machines, environments), then they must create their own namespace schema, and take responsibility for ‘declaring’ and maintaining that schema.

3.9.11 Application profiles include a list of the governing namespaces together with their current URL (preferably PURL – permanent URL). These are replicated in each metadata instance. There then follows a list of each data element together with permitted values and style of content. This may refer to in-house or additional rules and controlled vocabularies, e.g. thesauri of instrument names and genres, authority files of personal names and subjects. The profile will also specify mandatory schemes for particular elements such as dates (YYYY-MM-DD) and geographical co-ordinates and such standardised representations of location and time will be able to support map and timeline displays as non-textual retrieval devices.

 

Name of Term Title
Term URI http://purl.org/dc/elements/1.1/title
Label Title
Defined By http://dublincore.org/documents/dcmi-terms/
Source Definition A name given to a resource
BLAP-S Definition The title of the work or work component
Source Comments Typically, a Title will be a name by which the source is formally known
BLAP-S Comments If no title is available construct one that is derived from the resource or supply [no title]. Follow normal cataloguing practice for recording title in other languages using the ‘Alternative’ refinement.Where data are derived from the Sound Archive catalogue, this will equate to one of the following title fields in the following hierarchical order:Work title (1), Item title (2), Collection title (3), Product title (4), Original species (5) Broadcast title (6), Short title (7), Published series (8), Unpublished series (9)
Type of term Element
Refines  
Refined by Alternative
Has encoding scheme  
Obligation Mandatory
Occurrence Not repeatable

Fig 5: Part of the British Library’s application profile of DC for sound (BLAP-S):

Namespaces used in this Application Profile
DCMI Metadata Terms http://dublincore.org/documents/dcmi-terms/
RDF http://www.w3.org/RDF/
MODS elements http://www.loc.gov/standards/mods/
TEL terms http://www.theeuropeanlibrary.org/metadatahandbook/telterms.html
BL Terms http://labs.bl.uk/metadata/blap/terms.html
MARCREL http://id.loc.gov/vocabulary/relators.html

3.9.12 The application profile therefore incorporates or draws on a data dictionary (a file defining the basic organisation of a database down to its individual fields and field types) or several data dictionaries, that may be maintained by an individual archive or shared with a community of archives. The PREMIS data dictionary (http://www.loc.gov/standards/premis/v2/premis-2-0.pdf currently version 2) relating exclusively to preservation is expected to be drawn on substantially. Its numerous elements are known as ‘Semantic units’. Preservation metadata provides intelligence about provenance, preservation activity, technical features, and aids in verifying the authenticity of a digital object. The PREMIS Working Group released its Data Dictionary for Preservation metadata in June 2005 and recommends its use in all preservation repositories regardless of the type of materials archived and the preservation strategies employed.

3.9.13 By defining application profiles and, most importantly by declaring them, implementers can share information about their schemas in order to collaborate widely on universal tasks such as long-term preservation