Organization is what you do before you do something, so that when you do it, it’s not all mixed up. — A. A. Milne
When the creator of “Winnie the Pooh” wrote that sentence, he probably was not thinking about metadata. But the phrase rings true for WGBH, Boston’s public radio and TV outlet, as well as for the Library of Congress.
Those entities were chosen in 2013 by the Corporation for Public Broadcasting as stewards of the American Archive of Public Broadcasting, or AAPB, and joined forces to preserve thousands of hours of public radio and television archival broadcasts.
Casey Davis Kaufman
Recently, the National Endowment for the Humanities awarded WGBH a $345,000 grant to pursue PBCore, a metadata “schema” for the management of public media collections. WGBH’s Media Library and Archives has led the development of PBCore and now is using the grant to develop methods and workshops to make its standard more accessible to archivists and public media organizations.
A schema is a standard for organizing information. Think of PBCore as a systematic way to document and manage all available talk and music programs so that people can access them more easily.
“We’re not going to make a lot of changes to the schema,” said Rebecca Fraimow, archivist at WGBH. “We are using the grant to make it more useable for librarians and archivists as well as media managers at stations. Data has to be structured in a certain way so it’ll be easier for people to search by keyword or other criteria to find what they are looking for.”
Karen Cariani is project director for the AAPB at WGBH.
“So far more than 40,000 hours of content from over 100 public radio and TV stations have been digitized,” she said. “The PBCore metadata schema provides a format and structure for entering details about these older analog recordings: host names, the station on which the material aired, program title, in some cases musical genre, performance rights ownership, date of broadcast and other information that is useful to describe the digital files.”
The public can search historic public broadcasting content at americanarchive.org. PBCore users can go to pbcore.org for schema documentation, how-to and tutorial materials and user community Github repository.
The entire AAPB is available “on-site,” meaning at WGBH Boston and at the Library of Congress reading room in Washington. however much of the collection (more than 17,000 digitized historic programs from 70+ stations) is available online at www.americanarchive.org, with more being added every week.
“One of our biggest challenges is the messy data, or in some cases a lack of data, that we have on some of the audio and video from our contributing stations,” said Fraimow. “We spend a lot of time and resources identifying what content is significant and unique, and what content is a duplication of something we already have. We are playing catch-up with 60+ years of material.”
A visualization of the PBCore schema and data model. You can see enlarged versions of these images on radioworld.com, search “PBCore.”
This is the AAPB metadata management system based off of PBCore, where the team ingests, manages, stores and catalogs the 2.5 million metadata records submitted by stations.
“And finding funds to support the work is always a challenge,” added Cariani. “It costs money to digitize, catalog and process audio and video for the archive, and to do outreach and marketing. We are willing to work with our partner stations and help them in any way we can, including assisting them in writing local grants to sort out their unique material. Funding from NEH for the PBCore Development and Training Project will be used to develop a suite of tools to improve use of PBCore, as well as trainings and outreach to spread the word about using PBCore to describe audiovisual collections.”
The team knows that not everyone is a metadata expert.
“We will develop and provide tools so stations can use PBCore seamlessly,” said Casey Davis Kaufman, senior project manager at WGBH. “PBCore schema development is a community effort led by archivists and public media people who are committed to continued collaboration on PBCore’s improvements.”
Visit pbcore.orgfor FAQs, attributes, elements and other specifics.
Ken Deutsch is a former broadcaster who in his youth directed live music shows and children’s programs for public station WGTE(TV) in Toledo, Ohio. He says he is sure the station has recovered by now.
Asset: In PBCore, any piece of content — such as a program, clip or episode — can be defined as an asset. One asset may exist in many different forms (for example, on DVD, on a U-matic tape in English and on a VHS tape in French). If the content is the same, those would all be considered instantiations of the same asset.
Attribute: In XML, an attribute is a structure used to describe or provide more information about the data contained in an element. Attributes are stored within the value of an element, like this:
Here, the element is pbcoreTitle, and the attribute titleType provides more information about the title.
Class: A high-level group of related elements in XML
Container: A container element in XML is a way to group other elements together. Container elements usually do not hold data themselves, but act as a bucket for sub-elements that do hold data.
Element: An XML element is a way to store data in a self-explanatory manner, according to a structured and specific vocabulary. For example, putting the information “Lassie” within a “pbcoreTitle” element tells anyone (or any machine) looking at the data that “Lassie” is the title of the asset. Attributes may be associated with any element: these provide even further detail about the data.
Instantiation: An instantiation is a manifestation of an asset that is embodied in physical or digital form, such as a tape, DVD or digital file. One asset can have many instantiations, but generally each instantiation holds the same intellectual content.
Metadata: Metadata is a set of data that describes and gives information about other data. Metadata can include a wide variety of information, and different communities have different uses for metadata. Description (title, subject), technical information or rights information are all types of metadata. Often there are different types of metadata needed for different purposes: structural metadata, technical metadata, preservation metadata. PBCore describes descriptive, structural and technical metadata (and can be considered by some uses to be preservation metadata as well).
Schema: An XML schema lays out rules for structuring an XML document in a specific way. The PBCore schema specifies how PBCore information should be written in XML so that people and machines can consistently understand the information contained in PBCore documents by referencing the schema.
XML: Extensible Markup Language is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable. It is defined by the W3C‘s XML 1.0 Specification and by several other related specifications.
XSD: An XSD is the document that defines an XML schema. It can be used to validate other XML documents to make sure that they are complying with the rules of the schema.