3.10 Sources of Metadata

3.10.1     Archives should not expect to create all descriptive metadata by themselves from scratch (the old way). Indeed, given the in-built lifecycle relationship between resources and metadata such a notion will be unworkable. There are several sources of metadata, especially the descriptive category that should be exploited to reduce costs and provide enrichment through extending the means of input. There are three main sources: professional, contributed and intentional (Dempsey:2007): they may be deployed alongside each other.

3.10.2     Professional sources means drawing on the locked-in value of legacy databases, authority files and controlled vocabularies which are valuable for published or replicated materials. It includes industry databases, as well as archive catalogues. Such sources, especially archive catalogues, are notoriously incomplete and incapable of interoperation without sophisticated conversion programmes and complex protocols. There are almost as many data standards in operation in the recording and broadcasting industries and the audiovisual heritage sector as there are separate databases. The lack of a universal resolver for AV, such as ISBN for print, is a continuing hindrance and after decades of discographical endeavour there is still disagreement about what constitutes a catalogue record: is it an individual track or is it a sequence of tracks that make up an intellectual unit such as a multi-sectioned musical or literary work? Is it the sum total of tracks on a single carrier or set of carriers, in other words, is the physical carrier the catalogue unit? Evidently, an agency that has chosen one of the more granular definitions will find it much easier to export its legacy data successfully into a metadata infrastructure. Belt and braces approaches to data export based on Z39.50 (http://www.loc.gov/z3950/agency/ protocol for information retrieval) and SRW/SRU (a protocol for search and retrieve via standardized URL’s with a standardized XML response) will continue to provide a degree of success, as will the ability of computers to harvest metadata from a central resource. However, more effective investment should be made in the shared production of resources which identify and describe names, subjects, places, time periods, and works.

3.10.3     Contributed sources means user generated content. A major phenomenon of recent years has been the emergence of many sites which invite, aggregate and mine data contributed by users, and mobilize that data to rank, recommend and relate resources. These include, for example, YouTube and LastFM. These sites have value in that they reveal relations between people and between people and resources as well as information about the resources themselves. Libraries have begun to experiment with these approaches and there are real advantages to be gained by allowing users to augment professionally sourced metadata. So-called Web 2.0 features that support user contribution and syndication are becoming commonplace features of available content management systems.

3.10.4     Intentional means data collected about use and usage that can enhance resource discovery. The concept is borrowed from the commercial sector, Amazon recommendations, for instance, that are based on aggregate purchase choices. Similar algorithms could be used to rank objects in a resource. This type of data has emerged as a central factor in successful websites, providing useful paths through intimidating amounts of complex information.