Augmented Metadata in XHTMLNeocortext.Net Working Draft 10 May 2002 |
Copyright 2001-2002 Murray Altheim. All Rights Reserved.
This specification describes several minor syntax modifications to XHTML (the XML transformation of HTML) which provide much of the essential functionality required to augment Web pages with metadata as found in published descriptions of the Semantic Web. This augmentation allows Dublin Core metadata, a highly popular standard developed by the library community to be incorporated in Web pages in a way that is compatible with today's Web browsers, and describes a generalized mechanism by which other popular schemas can be used in similar fashion. The metadata can be associated with any XHTML or XML document or document fragment (actually, any addressable resource), internal or external to the document.
This document is intended for review and comment by interested parties. It is a “work in progress,” currently has no formal status, and its publication should not be construed as endorsement by any corporate or academic body. This document may be updated, replaced, rendered obsolete by other documents, or removed from circulation at any time. It is inappropriate to use this document as reference material, or cite it as anything other than a “work in progress.” Distribution of this document is unlimited.
From HTML 2.0 in 1995 [HTML2], HTML 4.0 in 1999 [HTML4], through XHTML 1.1 in 2001 [XHTML11], there's been one method of including metadata within a Web document, the <meta> element. Notably, the <meta> element contains metadata regarding the entire document, and does not allow for metadata annotation of document components or provide a robust mechanism for referencing existing classification schemes, taxonomies or ontologies.
This specification describes three minor modifications to XHTML 1.1 which provide much of the essential functionality required to augment Web pages with schema-characterized metadata, as according to the need expressed in published descriptions of the Semantic Web [W3CSW] [SCIAM]. Using the extensibility provided by the W3C Recommendation Modularization of XHTML [XHTMLMOD], this specification includes an “XHTML Augmented Metadata 1.0 DTD” that implements these features.
The first two modifications are relatively trivial, in terms of implementation:
The third modification is to:
The Dublin Core Metadata Initiative (DCMI) has provided a specification describing a method of including Dublin Core metadata as attribute values in HTML <meta> elements [DCQ-HTML]. This method becomes even more valuable with the modifications this specification provides. Because of the isomorphism between Dublin Core embedded in <meta> and its expression in RDF, harvesting of and transformation between these formats (e.g., via an XSLT stylesheet) is possible. While not normative to this specification, such a stylesheet is planned/included.
This specification provides a brief introduction to the Dublin Core Metadata Element Set, as well as examples and suggestions for use. It also describes the minor changes required to be compatible with XLink (e.g., for use in XHTML 2.0), and revealing Dublin Core's lineage in ISO 11179 Specification and Standardization of Data Elements [ISO11179] demonstrates how this specification may be extended for use with any ISO 11179-based metadata set, i.e., generalized for use with any XML markup language.
Finally, Dublin Core content may be qualified using a set of standardized values [DCMES-Qualifiers], such as stating from which classification system (e.g., Library of Congress Subject Heading or Dewey Decimal Classification) its "subject" is derived. This specification shows how these qualifiers may be extended to allow for other classification systems, such as references into RDF or XML Topic Map [XTM] based ontologies, and provides examples of such use.
The rationale behind this document is to serve as both specification and tutorial. Rather than separate the pedagogical material and examples, these are interspersed among the sections being explained. Notes appearing in this specification (which are offset left and displayed in a smaller font, as following this paragraph) are informative, and are not considered essential to the understanding or use of the features described herein.
NOTE: In this specification, Dublin Core may sometimes be abbreviated to “DC”.
ED. NOTE: "ISSUE:" or "ED.NOTE:" statements such as this will not appear in the final document. NOTEs will likely be eliminated, brought into the main text, or at least edited for length. Also, there are more examples provided currently than planned for the final draft, to assist in discussion of various issues.
In the library community, a trip to the library catalog is an exercise in browsing metadata. And to a great degree, isn't this what's missing on the Web? Keyword-based searches either turn up nothing or a thousand “hits” (with its meaning often more in line with a punch in the nose than a hit parade).
In the card catalog, an individual card is a record that references a book, serial, or some other item that can be retrieved (by you, or the librarian, for you). Each record includes metadata about an item in the libary. Either an item without metadata (e.g., a book with no record), or metadata without an item (e.g., a record but no book) is less than interesting, perhaps frustrating. On the Web, this is the page you didn't find.
The term metadata is a compound that differs from “data” by the addition of the Greek word meta, meaning “alongside, with, after, next.” So metadata differs from data in that it never stands alone; it is always data associated with whatever it describes. In the computer world, this means that whatever is being described must in some way itself be addressable (i.e., retrievable by some means, such as by identifier or location). But in another sense, metadata is just data. “Going meta” another level, what is metadata at one level may be simply data at another. After all, the cards and the card catalog itself each have a location, whether physical or electronic.
The Dublin Core metadata standard, arising from a cross-disciplinary group of librarianship, computer science, and other professions organized by the Online Computer Library Center (OCLC) in Dublin, Ohio, is a simple set of fifteen elements used to describe a wide range of networked resources. The Dublin Core Metadata Element Set (DCMES) was designed to allow non-specialists to create simple descriptive records for information resources.
Why is Dublin Core interesting as a metadata standard? While certainly popular within the library community, it has wider application, and for good reason. The community developing the Dublin Core did not invent it from scratch, they based it on an existing ISO standard for defining metadata specifications. Each Dublin Core element is defined using a set of ten attributes from ISO/IEC 11179 Specification and Standardization of Data Elements [ISO11179] a standard for the description of data elements. This helps to improve consistency with other metadata standards based on ISO 11179, such as the OASIS Registry and Repository project.
Note that the ISO 11179 attributes do not appear in documents, but are part of the formal definition of Dublin Core elements. Of the ten attributes, six are common to all Dublin Core elements, and provide such information as the name and version of the Dublin Core standard. The remaining four ISO 11179 attributes for each Dublin Core element are provided in the next section.
It would be pointless to try to surpass the documentation included on the DCMI site. The DCMI Recommendation Using Dublin Core [Using DC] provides an excellent introduction to the Dublin Core, and this document is highly recommended reading. This specification does provide an appendix designed as a short introduction and reference for those creating Dublin Core metadata for use in XHTML documents. See Appendix C: The Dublin Core Metadata Element Set.
[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]
A violation of the rules of this specification; results are undefined.
This specification uses the same definition found in [RDF]:
Metadata is "data about data" (for example, a library catalog is metadata, since it describes publications) or specifically in the context of this specification "data describing Web resources". The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.
ED. NOTE: If there are any specific terms anyone thinks need calling out into this Terminology section, let me know.
The requirements that this specification fulfills are broken into two parts, hard and soft, the must-have and the should-have. This section will be suffixed by links to solutions to each issue as within this document.
The design must:
Additionally,
The design should:
Link processing normatively depends on [RFC 2396] (as updated by [RFC 2732]) processing, including character escaping as defined in these RFCs.
It is an error if a document does not adhere to the conformance requirements described in this specification.
It is also an error when metadata content violates the semantics of its schema. Because this is often very difficult to validate, errors of this type may not be discernable by machine processes. Such “semantic” validation is outside the scope of this specification.
In all cases,
<meta>
element<link>
element for each schema
used must be included in the document <head>
(see Section 5.5.2 for details)When the Dublin Core Metadata Element Set syntax is used:
rdf:resource
attribute occurs on a Dublin Core
element, the RDF Namespace must also be declared
(see Section 5.5.2 for details).dc:
”) should be used
While not conformant with the above declarations, this specification recommends
that when RDF fragments are included in well-formed XHTML documents (as defined
in Section 3.1.2 of [XHTML1]), they should
wrapped within XHTML <meta>
elements.
This section describes the changes to XHTML 1.1 necessary to augment
its metadata features in support of a semantically-richer Web, amounting
to three changes: allowing the existing <meta>
element
as inline content anywhere in a Web document; extending its ability to
reference content by adding an href
attribute; and finally,
allowing elements from a popular metadata standard (Dublin Core) as its content.
Previous specifications for HTML and XHTML have designed one means of including
metadata within a document, the <meta>
element.
<meta>
is an empty element, meaning that its content model is
declared EMPTY
and contains no child elements or character data.
The attributes on the <meta>
element have evolved slightly
over the years. To its three original attributes in HTML 2.0, HTML 4 added one,
XHTML 1.0 added three more, so it now has a total of seven attributes. The latter
three below are the ones that most interest us:
Attribute Name | Description |
---|---|
xmlns | default XML Namespace declaration |
lang | language code |
dir | language direction |
http-equiv | HTTP response header name (see note) |
name | property name (i.e., metainformation name) |
content | property value (i.e., associated metainformation content) |
scheme | refinement of property name or schema of property value |
This specification does not alter these definitions. The essense of
the <meta>
element is its ability to identify document
properties as name/value pairs, using its name
and
content
attributes. The scheme
is an optional
value that associates a specific scheme with a property value, such as
identifying the value “0-8047-3723-1” as being an ISBN number:
<meta scheme="ISBN" name="identifier" content="0-8047-3723-1" />
The ability to associate metadata with existing Web markup may have wide-ranging effects, as such metadata may be used to enrich content for harvesting and search engines, supply meta-information about existing markup practices (such as annotating client-side imagemaps, typing links, link specific content into an ontology, or adding descriptions to images.
NOTE on 'profile':
The HTML 4 Specification indicates a use for the 'profile' attribute on
the <head> element to be used to identify “metadata profiles,“
though after over four years little if any software pays any attention to
this design. It was perhaps a feature a bit before its time, as it should
be noted that a recent (DCMI)
Recommendation indicates use of the 'profile' attribute following a method
similar to that first outlined in HTML 4. But for the purposes of this
specification, the functionality provided by 'profile' is provided by an
explicit <link> element. More on this later.
NOTE on 'http-equiv':
<meta> elements may contain an 'http-equiv' attribute in lieu of a 'name'
attribute to be used to generate information for HTTP response headers. Given
the nature of this functionality, use of <meta> elements containing the
'http-equiv' attribute are undefined by this specification when located outside
of the document <head> element, and such usage is not recommended as it
is likely to have unpredictable results.
This specification alters the document model of XHTML to include the
<meta>
element as content wherever inline content has
previously been allowed. Any <meta>
element thus appearing
in the document body is considered metadata about its parent element,
with two exceptions. First, the existing use of <meta>
within
<head>
is unchanged: this is still metadata associated
with the entire document. The second, linked metadata, is described in the
next section.
Because one of the primary reasons for including metadata in XHTML is to allow for is harvesting and use, when metadata is associated via this parent-child relationship, authors should assist the process of addressing the parent by specifying a unique value for its ID attribute. Absent some means of easily addressing this original resource, processors would have to resort to XPath or other querying methods, which may not be supported in all applications. Due to the many possible applications of this technology, this is a recommendation but not a requirement.
This example shows how <meta>
elements may be
associated with a paragraph by their inclusion as child elements.
It also shows how to incorporate Dublin Core Qualifiers to reference
well-established classification schemes such as the Library of Congress
Subject Headings (LCSH) or Dewey Decimal Classification (DDC). This is
described fully in [DCMES-Qualifiers]
and [DCQ-HTML].
<p id="ants"> <meta name="DC.Title" content="Ants (Hymenoptera:Formicidae)" /> <meta name="DC.Subject" content="1. Ants. 2. Arizona Desert." /> <meta name="DC.Subject" scheme="LCSH" content="QL568" /> <meta name="DC.Subject" scheme="DDC" content="595" /> There are more than 250 species of ants in Arizona alone. Ants, like bees, are social creatures who live in large colonies, all working together for the benefit of the group. Colonies may last for 10 to 20 years, though the individual worker ant may only live for two months to one year. </p>
This expresses both a suitable title for the paragraph as well as unambiguously indicating
its subject. Note the presence of the id
attribute on <p>
,
to assist in externally addressing the paragraph once the metadata has been extracted.
This example shows a method of typing XHTML links using the Dublin Core
type
element and the link type scheme defined in
Section 6.12 Link types of [HTML4].
Note the presence of the <link>
element, required to connect
the use of the “DC”
attribute namespace with the
Dublin Core XML Namespace. This is a practice established by both
[HTML4] and
[DCMES-HTML].
<html> <head> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> </head> <body> ... <p> <a href="inverts/grasshoppers.html"> <meta name="DC.type" scheme="HTML4" content="Prev" /> <meta name="DC.title" content="Previous Chapter" /> <meta name="DC.language" content="EN" /> <meta name="DC.description" content="Grasshoppers, Walkingsticks, Termites, Bugs and Beetles" /> <img src="images/prev-arrow.gif" alt="Previous Chapter" /> </a> <a href="inverts/scorpions.html"> <meta name="DC.type" scheme="HTML4" content="Next" /> <meta name="DC.title" content="Next Chapter" /> <meta name="DC.language" content="EN" /> <meta name="DC.description" content="Scorpions, Spiders, Centipedes and Millipedes" /> <img src="images/next-arrow.gif" alt="Next Chapter" /> </a> </p>
The above method is more expressive than the HTML's rel
and rev
attributes, as it provides access to the entire semantics of the Dublin Core (including
such things as media type, multiple languages, access rights, etc.) in providing link
metadata.
This is similar to the previous example, but types XHTML links using XLink's
role
and title
attributes. Note again the presence
of the <link>
element, required to connect the use of the
“XLINK”
attribute namespace with XLink's XML Namespace.
<html> <head> <link rel="schema.XLINK" href="http://www.w3.org/1999/xlink" /> </head> <body> ... <p> <a href="inverts/grasshoppers.html"> <meta name="XLINK.role" scheme="HTML4" content="Prev" /> <meta name="XLINK.title" content="Previous Chapter" /> <img src="images/prev-arrow.gif" alt="Previous Chapter" /> </a> <a href="inverts/scorpions.html"> <meta name="XLINK.role" scheme="HTML4" content="Next" /> <meta name="XLINK.title" content="Next Chapter" /> <img src="images/next-arrow.gif" alt="Next Chapter" /> </a> </p>
Because XLink provides fewer metadata features than Dublin Core, this method is not as expressive as the previous example.
ED. NOTE:
Consider dropping this example as perhaps too complex and controversial.
This example shows how a <meta>
element may be
associated with an anchor element by including it as a child element.
By adding a <link>
element into the document
<head>
associating the XLink semantics (this
requirement is discussed more fully in Section 5.5.2),
XHTML links may be provided the full range of XLink properties. While
this specification boasts no ambition to replace XLink, this ability
is an interesting byproduct, and perhaps could be part of a transitional
strategy. The following example mimics the functionality of the last
example in Section 5.1.2 of [XLINK]:
<person xlink:href="students/patjones62.xml" xlink:label="student62" xlink:role="http://www.example.com/linkprops/student" xlink:title="Pat Jones" /> <person xlink:href="profs/jaysmith7.xml" xlink:label="prof7" xlink:role="http://www.example.com/linkprops/professor" xlink:title="Dr. Jay Smith" /> <course xlink:href="courses/cs101.xml" xlink:label="CS-101" xlink:title="Computer Science 101" /> |
with its XHTML Augmented Metadata rendition (note the <link>
element):
<html> <head> <link rel="schema.XLINK" href="http://www.w3.org/1999/xlink" /> </head> <body> ... <a href="students/patjones62.xml"> <meta name="XLINK.label" content="student62" /> <meta name="XLINK.role" content="http://www.example.com/linkprops/student" /> <meta name="XLINK.title" content="Pat Jones" /> </a> <a href="profs/jaysmith7.xml" /> <meta name="XLINK.label" content=="prof7" /> <meta name="XLINK.role" content="http://www.example.com/linkprops/professor" /> <meta name="XLINK.title" content="Dr. Jay Smith" /> </a> <a href="courses/cs101.xml"> <meta name="XLINK.label" content="CS-101" /> <meta name="XLINK.title" content="Computer Science 101" /> </a>
For information on how to associate metadata with empty XHTML elements, see Example 5.3A in the following section.
There are certainly times when it is impractical or impossible to include
a <meta>
child on a specific element, such as when the
element is declared "EMPTY" (e.g., <img>
), when direct
nesting is problematic for processing or display, or when the document cannot
be modified (e.g., it's on a CDROM or somebody else's web site). In such cases
it becomes necessary to be able to associate this metadata by reference.
This is done by adding an href
attribute, which when present
supercedes the child element association described in the previous section.
The content of the href
attribute is a URI reference specifying
the location of a Web resource, thus defining a link between the
<meta>
element and the identified resource. Using this
method, it is also possible to include an entire document's metadata within
its <head>
element.
This example shows how to associate metadata with an empty element.
<img id="ant" alt="Harvester Ant" src="http://www.desertmuseum.org/natural_history/inverts/images/harvest.gif" /> <meta href="#ant" name="DC.Title" content="Harvester Ant (Pogonomyrmex spp.)" /> <meta href="#ant" name="DC.Format" content="image/gif" />
While the <meta>
element may be more interoperable with older
browsers, there is a tradeoff: use of this linking feature requires more syntax
redundancy, since the link must be repeated for each DC component. A solution to this
is found in the next section.
Similarly, the <img>
element's required alt
attribute is perhaps redundant the Dublin core's dc:description
,
but the latter is more generally identifiable as a metadata description of
the resources that a document author wishes to make explicit as resources
(as opposed to all of the assorted images that litter Web pages for presentational
or layout purposes).
Likewise, as in Example 5.2A, the advantage of using a
Dublin Core title rather than the existing title
attribute on an
XHTML element is that:
title
attribute are
not identical with Dublin Core's dc:title
, and should not be
considered equivalent.
The downside of using the Dublin Core title over XHTML's title
attribute is that the latter has been provided as an accessibility solution
and advocated by the Web Accessibility Initiative
(WAI). But given that it's unlikely that all XML document types will adopt an
xhtml:title
attribute, the Dublin Core solution is more general.
Had XHTML provided a title
attribute in a WAI namespace
(e.g., <p wai:title="Grasshopper Mouse">
), this
might have been a better accessibility solution for XML.
This example shows how to associate metadata with an external resource.
<meta href="http://www.desertmuseum.org/natural_history/reptiles/terrapene.html#box" name="DC.Title" content="Western Box Turtle (Terrapene ornata)" />
The fifteen elements of the Dublin Core Metadata Element Set (DCMES) (see Appendix C) have been published as an XML DTD [DCMES-XML]. This same set of elements have been implemented as an XHTML module and included in a document type conforming to the definition of XHTML Host Language Document Type (see Section 3.1 of [XHTMLMOD]).
This specification extends the content model of the XHTML <meta>
element (previously declared "EMPTY"
) to allow for inclusion of
the fifteen DCMES elements. To assist in the display of DCMES content (when this
is desired), the content model also includes character data (i.e., PCDATA
)
and the XHTML <br>
(forced line break) element.
Just as described previously, the content of a <meta>
element
is associated with its parent element. When an href
attribute
is present, the contents of the <meta>
element are associated
with whatever addressable resource is referenced by the href
.
Note that by altering the content model of the <meta>
to allow
Dublin Core content, its previously-required content
attribute must be
made optional.
ISSUE:
Early prototypes of the augmented DTD did not include mixed content, which has
its downsides in any markup design. Do we really want mixed content here?
How much do we expect it will be abused? Harvesting of either metadata attributes
or namespaced content is not impacted by its presence, but some people may
expect raw text descriptions to be processed.
NOTE on <meta> content:
The character data and line break element content of a <meta> element
should under no circumstances be considered part of the metadata content,
and should be limited to only punctuation and other requirements for display
integrity. Processors designed to harvest such metadata are instructed to ignore
all non-DCMES element content.
NOTE on hidden metadata:
In such cases, CSS style definitions should be applied to wrapper elements, since
in testing with current browsers, style definitions applied directly to
<meta> elements or to non-XHTML content do not function in
most cases. It is hoped that CSS implementations will improve to support style
definitions that hide this content. As early as 1996, the CSS 1 Recommendation
stated that “all HTML element types are possible selectors” (Section
1 Basic Concepts of [CSS1]), yet
until such time as better CSS support is widespread, a transitional strategy is
needed. The greatest interoperability with older browsers is gained using the
method using attributes on <meta>, as described in
Section 5.2, or using a combination of the linking
method (Section 5.3) and a wrapper <div>
element whose style definition is "display : none".
NOTE on RDF:
Several questions arise:
Q1: Why is the Dublin Core content not wrapped by an <rdf:RDF> element?
A1: The RDF Model and Syntax Specification states that this is “optional
if the content can be known to be RDF from the application context.” It is expected
that the harvesting of Dublin Core content from a <meta> element would result in it being
expressed in DCMES (regardless of whether it was originally DCMES or <meta> attribute
content), wrapped in an <rdf:RDF> element whose 'rdf:about' attribute referenced the
same thing as the <meta> element from which it was harvested.
Q2: Why isn't an <rdf:Description> element necessary?
A2: The <rdf:Description> element (and its 'about' attribute) describe
from the RDF perspective to what resource the metadata refers. Since
according to this specification, the Dublin Core metadata included in the
<meta> element is a metadata resource defined to refer to either the
parent of the <meta> element or what its 'href' attribute refers to
(when present), the <rdf:Description> element is unnecessary. This is explicitly
described from the XHTML perspective.
This example shows how to include Dublin Core content in XML as
element content of <meta>
.
<p> <meta> <dc:title>Seed Harvester Ant (Pogonomyrmex spp.)</dc:title> <dc:subject>1. Ants. 2. Seed Harvester Ant</dc:subject> <dc:description>harvester ant nests</dc:description> </meta> Harvester Ants don't like rocky soil, preferring creosote flats and bottomland, but they may also be found in urban areas. These large, aggressive ants usually make a clearing about 3 feet in diameter around the entrance hole to the nest, which is flat. The area is kept clear by the ants, who bite off the stems and leaves of plants that try to grow there. </p>
Without the CSS implementation support for the hiding of <meta>
element content, the above metadata will be displayed. When this is actually desired,
it may be helpful to intersperse punctuation and/or <br />
elements
among the DCMES elements. Note that this character data is not considered part of
the metadata.
This example shows how to either hide DCMES metadata by wrapping it in a
<div>
element whose styling is hidden. By necessity,
this must use the linking feature described in Section 5.3, as the parent of the <meta>
element
is now the <div>
element.
<p id="mousegr"> <div class="hidden"> <meta href="#mousegr"> <dc:title>Southern Grasshopper Mouse, Onychomys torridus</dc:title> <dc:format>text/html</dc:format> <dc:description>A description of the Southern Grasshopper Mouse</dc:description> <dc:subject>1. Southern Grasshopper Mouse. 2. Carnivorous Mouse.</dc:subject> </meta> </div> The grasshopper mouse is an efficient predator, killing other mice with a bite to the back of the neck, and biting the stingers off scorpions before consuming them. Pinacate beetles emit a toxic spray from their rear ends, deterring most predators, but grasshopper mice catch them and shove the defensive ends of the beetles into the sand, then bite off the good parts, leaving beetle bottoms embedded in the sand. </p>
To hide the metadata content, the CSS stylesheet for the above <div>
element would be:
.hidden { display : none }
This same method can be employed to style such content for display.
The changes to the XHTML 1.1 DTD described in the below sections are implemented involving four files:
PUBLIC "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" "xhtml-augmeta10.dtd" PUBLIC "-//neocortext.net//ENTITIES XHTML DCMES 1.1 Qualified Names 1.0//EN" "dcmes-qname-1.mod" PUBLIC "-//neocortext.net//ELEMENTS XHTML Dublin Core Elements 1.0//EN" "xhtml-dcmes-1.mod"
Several changes are necessary to XHTML documents in order to correctly validate and process DC-augmented content when using Dublin Core elements.
<meta>
element, no XML Namespace declarations other than XHTML are required. However, if
it contains Dublin Core elements as metadata (e.g., <dc:title>
),
then the <html>
element must declare the XML Namespaces for
Dublin Core and RDF. See the example below.
<link>
element must be added to the document's
<head>
to provide a link between the Dublin Core schema
(i.e., a description of the syntax and associated semantics, not an
XML Schema), plus any others used in the document. These changes are
shown in the following example.
This example shows the addition of XML namespace declarations for DC and RDF, as well as the link to the DC schema.
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" "xhtml-augmeta10.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en"> <head> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <title>Natural History of the Sonoran Desert</title> </head> <body> ... </body> </html>
ED. NOTE:
describe harvesting; isomorphism between meta and DC elements; show example.
The word “harvest” is chosen deliberately over “mining,” as the common metaphor of “data mining” implies digging deep (perhaps in a database) for information, whereas the Web's content can be harvested by tools that skim its surface. This specification describes various methods for associating metadata with XHTML documents and document components. This section provides an example of how such metadata might be harvested for processing using relatively simple software tools.
...
This example shows one possible method for harvesting metadata. The below
document fragment is from a document whose assumed URI is
"http://www.desertmuseum.org/natural_history/inverts/scorpions.html",
and contains Dublin Core content, both as attributes on <meta>
and as DCMES elements.
<p id="whip">
<meta>
<dc:title>Tailless Whipscorpion (Paraphrynus spp.)</dc:title>
<dc:subject>1. Scorpions. 2. Tailless Whipscorpions</dc:subject>
</meta>
<meta name="DC.Subject" scheme="DDC" content="595" />
Tailless whipscorpions look at first glance like spiders. The first
appendages (pedipalps) are modified for grasping prey, with hook-like
projections. The first true pair of legs is modified to serve as
"feelers," and are long, delicate, and whip-like, with many fine hairs.
</p>
In the Xerces XML parser, the DOM method
getElementsByTagName("meta")
will extract all <meta>
elements from the supplied element. If applied to the document root, all metadata
contained as either attribute or element content will be returned as a NodeList,
from where it can be further processed. Since according to this specification,
instances of such metadata as either attributes on or element content within
<meta>
elements are isomorphic, what remains is simply a conversion
to an output expression suitable for its intended application. In the below example,
the content from the above example has been extracted, processed into DCMES content,
and then wrapped in an RDF element whose rdf:about
attribute references
the original parent element in the source document.
<rdf:RDF
rdf:about="http://www.desertmuseum.org/natural_history/inverts/scorpions.html#whip">
<dc:title>Tailless Whipscorpion (Paraphrynus spp.)</dc:title>
<dc:subject>1. Scorpions. 2. Tailless Whipscorpions</dc:subject>
<dc:subject>595</dc:title>
</rdf:RDF>
ED. NOTE:
This section describes how the design may be extended for other
schemas, as well as how references may be made into other subject
classification systems (e.g., the Cycorp ontology) and technologies
(e.g., XML Topic Maps). It also shows evolution to XHTML 2.0 by
providing an XLink rendition, as well as generalization to any XML
vocabulary. [The section titles are placeholders.]
...
ED. NOTE:
Show how to extend the DC subject qualifier (using an appropriate <link>
element) to hook into a different taxonomy than those listed by DCMI...
...
An XML Topic Map [XTM] is an XML document that can be used to represent the structure and associations (relationships) between information resources used to define topics. Using a XML Topic Map, can represent a set of relationships between subjects, and point out at occurrences of those subjects on the Web. In a sense, this is precisely what “traditional” maps do.
Topic Maps introduce a concept called a Published Subject
Indicator (PSI), essentially a URI “published” (i.e.,
given some measure of public notice and stability) with the purpose of establishing
subject identity. This proxy for an abstract
subject is called a topic, and in an XTM
document is represented by a <topic>
element.
PSIs are similar to the concept of the Uniform Resource Name [URN], except that they are intended to identify not only online resources, but those that cannot be referenced directly on a computer system, such as physical objects, events, or locations (e.g., “Part No. 03-876023-51”, “Flight 831”, or “Monument Valley, Utah”), properties, classifications, or concepts (e.g., “Rainy”, “A Biological Species”, or “Business Relationship”). These are considered in Topic Map parlance non-addressable subjects.
While much of the discussion surrounding topic maps has been on how to reference
(i.e., “map”) Web content in an XTM document, the reverse has interesting
possibilities. One application using XHTML and XTM together might be to associate
metadata with a document or document component that references a topic map in order
to indicate its subject. This would be very similar to using the Dublin Core
<dc:subject>
described in Section FIXME, except
that a software system could auto-generate a topic map from a set of Web pages
provided the author-supplied, explicit subjects. Every web page incorporating
this methodology is potentially a participant in a Web-wide, implicit topic map.
FIXME this is only true of a XTM.subjectIndicatorRef!
There are three distinct types of references in XTM that are applicable here.
Given these three reference types, the values of the <meta>
element's name
attribute are provided below.
Subject Indicator Reference | |
---|---|
Identifier: | "XTM.subjectIndicatorRef" |
Element Type: | <subjectIndicatorRef> |
Definition: | Provides a URI reference to a resource indicating the subject
of whatever resource the <meta> element is
associated. |
Comment: | The referenced resource does not have to be in an XTM document (though it certainly could be), but it should unambiguously identity a subject. |
Topic Reference | |
Identifier: | "XTM.topicRef" |
Element Type: | <topicRef> |
Definition: | Provides a reference to a <topic> element
in an XTM document indicating the subject of whatever resource
the <meta> element is associated. |
Comment: | This is the same as <subjectIndicatorRef>
except for the additional referencing constraint. |
Resource Reference | |
Identifier: | "XTM.resourceRef" |
Element Type: | <resourceRef> |
Definition: | Provides a URI reference to a resource as a topic. |
Comment: | This should be carefully discerned from the other reference types, in that the resource itself is the subject, not what the resource might contain or indicate by its contents. |
This example shows how to incorporate XML Topic Map semantics to provide a subject
classification. The markup below indicates the subject of the defining instance of
a term (wrapped by XHTML's <dfn>
element) by including a
reference to a <topic>
element in a topic map about toads
(“toads.xtm”) using a topic reference ("XTM.topicRef"
):
<html> <head> <title>A Natural History of the Sonoran Desert</title> <link rel="schema.XTM" href="http://www.topicmaps.org/xtm/1.0/" /> </head> ... <body> <p>During summer monsoons, the <dfn id="couch"> <meta name="XTM.topicRef" content="http://www.doctypes.org/sonoran/toads.xtm#toad-spadefoot" /> spadefoot toad</dfn> is well-known for emerging from its subterranean estivation to breed in the temporary ponds created by the heavy runoff. Preying primarily upon beetles, grasshoppers, katydids, ants, spiders, and termites, a spadefoot can consume enough food in one meal to last an entire year! </p>
The topic referenced above might appear in an XTM document as something akin to the
below fragment. Note that the <topic>
element below includes
an <occurrence>
element that references the location in our example
above as well as a <subjectIdentity>
reference to the same location.
Hence, the topic map and the web page are co-reflexive, the latter serving as both an
indicator and occurrence of the subject “Couch's Spadefoot Toad”.
<topic id="toad-spadefoot"> <instanceOf><topicRef xlink:href="#species" /></instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://www.desertmuseum.org/natural_history/reptiles/bufo.html#couch" /> </subjectIdentity> <subjectIdentity> <subjectIndicatorRef xlink:href="#toad-spadefoot-description" /> </subjectIdentity> <baseName><scope><subjectIndicatorRef xlink:href="language.xtm#en" /></scope> <baseNameString>Couch's Spadefoot</baseNameString> </baseName> <baseName><scope><subjectIndicatorRef xlink:href="language.xtm#en" /></scope> <baseNameString>Spadefoot Toad</baseNameString> </baseName> <baseName><scope><subjectIndicatorRef xlink:href="language.xtm#es" /></scope> <baseNameString>sapo con espuelas</baseNameString> </baseName> <occurrence> <resourceData id="toad-spadefoot-description">Couch's Spadefoot, a small toad that ranges from southeastern California through southern Arizona and southern New Mexico.</resourceData> </occurrence> <occurrence> <subjectIndicatorRef xlink:href="http://www.desertmuseum.org/natural_history/reptiles/bufo.html#couch" /> </occurrence> </topic>
There are three files included as part of the XHTML Augmented Metadata 1.0 DTD, the DTD driver, DCMES Module, and DCMES Qualified Name Module. The SGML Open catalog file changes necessary to support these new components are described in Section 5.4.1 XHTML DTD Changes.
Also available is a distribution that includes all DTD files plus the specification itself, at xhtml-augmeta.tar.gz.
This is the DTD driver file, available as xhtml-augmeta10.dtd. Note that this DTD is also available in normalized form, i.e., with all entities instantiated (DTD modules included), as xhtml-augmeta10-f.dtd (~165K).
<!-- ....................................................................... --> <!-- XHTML Augmented Metadata 1.0 DTD ..................................... --> <!-- file: xhtml-augmeta10.dtd --> <!-- XHTML Augmented Metadata 1.0 DTD This is an extension of XHTML, a reformulation of HTML as a modular XML application. This XHTML 1.1-based DTD augments the metadata features of XHTML, including the addition of Dublin Core content in the <meta> element, in support of a semantically-rich World Wide Web. It also includes declarations for the 'name' attribute on anchors for better legacy browser support. XHTML Augmented Metadata 1.0 DTD, Copyright 2002, Murray Altheim. With the added requirement that this paragraph remain intact, the license for distribution and use of this DTD and its accompanying documentation is identical to XHTML, as described below. The Extensible HyperText Markup Language (XHTML) Copyright 1998-2001 World Wide Web Consortium (Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, Keio University). All Rights Reserved. Permission to use, copy, modify and distribute the XHTML DTD and its accompanying documentation for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies. The copyright holders make no representation about the suitability of the DTD for any purpose. It is provided "as is" without expressed or implied warranty. Author: Murray M. Altheim <m.altheim@open.ac.uk> Revision: $Id: xhtml-augmeta10.dtd,v 4.1 2001/06/05 09:22:01 altheim Exp $ --> <!-- This is the driver file for version 1.0 of the XHTML Augmented Metadata DTD. Please use this formal public identifier to identify it: "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" --> <!ENTITY % XHTML.version "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" > <!-- Use this URI to identify the default namespace: "http://www.w3.org/1999/xhtml" See the XHTML Qualified Names module ("xhtml-qname-1.mod") for more information on the use of namespace prefixes in the DTD. --> <!ENTITY % DC.prefixed "INCLUDE" > <!ENTITY % NS.prefixed "IGNORE" > <!-- In addition to the XHTML namespace, use of this DTD requires the addition of two namespaces, RDF and Dublin Core. For example, if you are using XHTML Augmented Metadata 1.0 directly, use the FPI in the DOCTYPE declaration, with the xmlns attributes on the document element to identify the default and added namespaces. Also required is a <link> element which ties the prefix "DC" to the XML Dublin Core schema (gaining its element definitions and their semantics). <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" "xhtml-augmeta10.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en"> <head> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <title>Document Title</title> </head> ... </html> Revisions: (none) --> <!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: --> <!-- declare for inclusion of Dublin Core Qualified Names Module --> <!ENTITY % xhtml-qname-extra.mod PUBLIC "-//neocortext.net//ENTITIES XHTML DCMES 1.1 Qualified Names 1.0//EN" "dcmes-qname-1.mod" > <!-- Dublin Core for XHTML Module ............................... --> <!ENTITY % xhtml-dcmes.module "INCLUDE" > <![%xhtml-dcmes.module;[ <!ENTITY % xhtml-model.redecl PUBLIC "-//neocortext.net//ELEMENTS XHTML Dublin Core Elements 1.0//EN" "xhtml-dcmes-1.mod" > ]]> <!-- instantiate XHTML 1.1 DTD .................................. --> <!ENTITY % xhtml11.dtd PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd" > %xhtml11.dtd; <!-- Name Identifier Module ..................................... --> <!ENTITY % xhtml-nameident.module "INCLUDE" > <![%xhtml-nameident.module;[ <!ENTITY % xhtml-nameident.mod PUBLIC "-//W3C//ELEMENTS XHTML Name Identifier 1.0//EN" "xhtml-nameident-1.mod" > %xhtml-nameident.mod;]]> <?doc type="doctype" role="title" { XHTML Augmented Metadata 1.0 } ?> <!-- end of XHTML Augmented Metadata 1.0 DTD .............................. --> <!-- ....................................................................... -->
This is the Dublin Core 1.1 module for use with XHTML, available as xhtml-dcmes-1.mod.
<!-- ...................................................................... --> <!-- Dublin Core 1.1 Module for XHTML .................................... --> <!-- file: xhtml-dcmes-1.mod This is an extension of XHTML, a reformulation of HTML as a modular XML application. Copyright 2002 Murray Altheim. All Rights Reserved. Revision: $Id: xhtml-dcmes-1.mod,v 4.4 2001/06/12 16:26:28 altheim Exp $ This DTD module is identified by the PUBLIC and SYSTEM identifiers: PUBLIC "-//neocortext.net//ELEMENTS Dublin Core 1.1 for XHTML 1.0//EN" SYSTEM "http://www.doctypes.org/specs/meta/xhtml-dcmes-1.mod" (temporary) Based on: XML DTD 2001-04-11 for Dublin Core Metadata Element Set version 1.1 Authors: Dave Beckett <dave.beckett@bristol.ac.uk> Eric Miller <emiller@oclc.org> Dan Brickley <daniel.brickley@bristol.ac.uk> Date Issued: 2001-04-11 See: http://dublincore.org/documents/2001/04/11/dcmes-xml/dcmes-xml-dtd.dtd --> <!-- Dublin Core 1.1 Elements for XHTML dc:title dc:creator dc:subject dc:description dc:publisher dc:contributor dc:date dc:type dc:format dc:identifier dc:source dc:language dc:relation dc:coverage dc:rights This module declares the Dublin Core Metadata Element Set (DCMES) based on an XML version of DCMES 1.1 published by the Dublin Core Metadata Initiative. While the XML Namespace prefix for these elements can be redeclared, no provision for its optionality has been provided, as these elements are expected to be prefixed when used with XHTML (see "dcmes-qname-1.mod"). Note that the XML Namespace declarations are slightly different than some XHTML-based document types in that while the 'xmlns:dc' and 'xmlns:rdf' attributes are allowed on each Dublin Core element, this DTD relies on the #FIXED values on <html> to supply validation of their values. This is to avoid default values being added by XML processors, which may substantially increase the size of a document. This module is included as part of: "Augmented Metadata for XHTML" http://www.neocortext.net/specs/meta/NOTE-augmeta.html --> <!-- a parameter entity class containing the DCMES 1.1 elements --> <!ENTITY % DC.class "%DC.title.qname; | %DC.creator.qname; | %DC.subject.qname; | %DC.description.qname; | %DC.publisher.qname; | %DC.contributor.qname; | %DC.date.qname; | %DC.type.qname; | %DC.format.qname; | %DC.identifier.qname; | %DC.source.qname; | %DC.language.qname; | %DC.relation.qname; | %DC.coverage.qname; | %DC.rights.qname;" > <!-- modify content model for inclusion of <meta> as an inline element --> <!ENTITY % Inline.extra '| %meta.qname;' > <!-- changes to <meta> as part of 'Augmented Metadata for XHTML' --> <!ENTITY % meta.content "( #PCDATA | %br.qname; | %DC.class; )*" > <!ATTLIST %meta.qname; %id.attrib; content CDATA #IMPLIED href %URI.datatype; #IMPLIED > <!-- add XML Namespace declarations to <html> to allow for defaulting --> <!ATTLIST %html.qname; %DC.xmlns.attrib; > <!-- This section declares the elements and attributes for the Dublin Core Metadata Element Set 1.1. --> <!ENTITY % lang.attrib "xml:lang %LanguageCode.datatype; #IMPLIED" > <!ENTITY % RDF.resource.attrib "xmlns:%RDF.prefix; %URI.datatype; #IMPLIED %RDF.pfx;resource %URI.datatype; #IMPLIED" > <!ENTITY % DC.xmlns-optional.attrib "xmlns:%DC.prefix; %URI.datatype; #IMPLIED" > <!-- The name given to the resource. --> <!ELEMENT %DC.title.qname; ( #PCDATA ) > <!ATTLIST %DC.title.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- An entity primarily responsible for making the content of the resource. --> <!ELEMENT %DC.creator.qname; ( #PCDATA ) > <!ATTLIST %DC.creator.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- The topic of the content of the resource. --> <!ELEMENT %DC.subject.qname; ( #PCDATA ) > <!ATTLIST %DC.subject.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- An account of the content of the resource. --> <!ELEMENT %DC.description.qname; ( #PCDATA ) > <!ATTLIST %DC.description.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- The entity responsible for making the resource available. --> <!ELEMENT %DC.publisher.qname; ( #PCDATA ) > <!ATTLIST %DC.publisher.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- An entity responsible for making contributions to the content of the resource. --> <!ELEMENT %DC.contributor.qname; ( #PCDATA ) > <!ATTLIST %DC.contributor.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- A date associated with an event in the life cycle of the resource. --> <!ELEMENT %DC.date.qname; ( #PCDATA ) > <!ATTLIST %DC.date.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- The nature or genre of the content of the resource. --> <!ELEMENT %DC.type.qname; ( #PCDATA ) > <!ATTLIST %DC.type.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- The physical or digital manifestation of the resource. --> <!ELEMENT %DC.format.qname; ( #PCDATA ) > <!ATTLIST %DC.format.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- An unambiguous reference to the resource within a given context. --> <!ELEMENT %DC.identifier.qname; ( #PCDATA ) > <!ATTLIST %DC.identifier.qname; %DC.xmlns-optional.attrib; %RDF.resource.attrib; %lang.attrib; > <!-- A reference to a resource from which the present resource is derived. --> <!ELEMENT %DC.source.qname; ( #PCDATA ) > <!ATTLIST %DC.source.qname; %DC.xmlns-optional.attrib; %RDF.resource.attrib; %lang.attrib; > <!-- A language of the intellectual content of the resource. --> <!ELEMENT %DC.language.qname; ( #PCDATA ) > <!ATTLIST %DC.language.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- A reference to a related resource. --> <!ELEMENT %DC.relation.qname; ( #PCDATA ) > <!ATTLIST %DC.relation.qname; %DC.xmlns-optional.attrib; %RDF.resource.attrib; %lang.attrib; > <!-- The extent or scope of the content of the resource. --> <!ELEMENT %DC.coverage.qname; ( #PCDATA ) > <!ATTLIST %DC.coverage.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- Information about rights held in and over the resource. --> <!ELEMENT %DC.rights.qname; ( #PCDATA ) > <!ATTLIST %DC.rights.qname; %DC.xmlns-optional.attrib; %lang.attrib; > <!-- end of xhtml-dcmes-1.mod -->
This DTD module provides XML Namespace support for the Dublin Core elements in XHTML, available as dcmes-qname-1.mod.
<!-- ...................................................................... --> <!-- Dublin Core 1.1 Qualified Names Module ............................... --> <!-- file: dcmes-qname-1.mod This is an extension of XHTML, a reformulation of HTML as a modular XML application. Copyright 2002 Murray Altheim. All Rights Reserved. Revision: $Id: dcmes-qname-1.mod,v 4.1 2001/06/05 09:22:01 altheim Exp $ This DTD module is identified by the PUBLIC and SYSTEM identifiers: PUBLIC "-//neocortext.net//ENTITIES XHTML DCMES 1.1 Qualified Names 1.0//EN" SYSTEM "http://www.neocortext.net/specs/meta/dcmes-qname-1.mod" (temporary) --> <!-- Dublin Core Metadata Element Set 1.1 (DCMES) Qualified Names Module This module is contained in two parts, labeled Section 'A' and 'B': Section A declares parameter entities to support namespace- qualified names, namespace declarations, and name prefixing for DCMES 1.1. Section B declares parameter entities used to provide namespace-qualified names for all DCMES 1.1 element types: --> <!-- Section A: Dublin Core XML Namespace Framework :::::::::::::: --> <!-- XML Namespaces for DCMES 1.1 and RDF --> <!ENTITY % DC.xmlns "http://purl.org/dc/elements/1.1/" > <!ENTITY % RDF.xmlns "http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <!-- NOTE: As specified in [XMLNAMES], the namespace prefix serves as a proxy for the URI reference, and is not in itself significant. The following may be redeclared in a document's internal subset. --> <!ENTITY % DC.prefix "dc" > <!ENTITY % RDF.prefix "rdf" > <!ENTITY % DC.pfx "%DC.prefix;:" > <!ENTITY % RDF.pfx "%RDF.prefix;:" > <!ENTITY % DC.xmlns.attrib "xmlns:%DC.prefix; %URI.datatype; #FIXED '%DC.xmlns;' xmlns:%RDF.prefix; %URI.datatype; #FIXED '%RDF.xmlns;'" > <!ENTITY % XHTML.xmlns.extra.attrib "%DC.xmlns.attrib;" > <!-- Section B: Dublin Core Qualified Names :::::::::::::::::::::: --> <!-- This section declares parameter entities used to provide namespace-qualified names for all Dublin Core element types. --> <!-- module: xhtml-dcmes-1.mod --> <!ENTITY % DC.title.qname "%DC.pfx;title" > <!ENTITY % DC.creator.qname "%DC.pfx;creator" > <!ENTITY % DC.subject.qname "%DC.pfx;subject" > <!ENTITY % DC.description.qname "%DC.pfx;description" > <!ENTITY % DC.publisher.qname "%DC.pfx;publisher" > <!ENTITY % DC.contributor.qname "%DC.pfx;contributor" > <!ENTITY % DC.date.qname "%DC.pfx;date" > <!ENTITY % DC.type.qname "%DC.pfx;type" > <!ENTITY % DC.format.qname "%DC.pfx;format" > <!ENTITY % DC.identifier.qname "%DC.pfx;identifier" > <!ENTITY % DC.source.qname "%DC.pfx;source" > <!ENTITY % DC.language.qname "%DC.pfx;language" > <!ENTITY % DC.relation.qname "%DC.pfx;relation" > <!ENTITY % DC.coverage.qname "%DC.pfx;coverage" > <!ENTITY % DC.rights.qname "%DC.pfx;rights" > <!-- end of dcmes-qname-1.mod -->
The Dublin Core Metadata Element Set (aka DCMES) is a relatively simple schema consisting of fifteen elements. These elements include the kind of content one typically finds on the copyright page of a book: title, author, subject, publication data, etc. The DCMES is designed to contain such data, though this certainly shouldn't be considered its only application. If one considers that the Dublin Core provides for the association of title, author, date (e.g., creation date, issue date, revision date), publisher (e.g., the webmaster or their employer), language, format (e.g., "text/html", "img/gif", "video/mpeg"), and perhaps most importantly for those interested in a “Semantic Web”, subject, then without extension the Dublin Core may provide for a suitable “80/20” point for introducing a well-designed and proven metadata standard onto the Web.
The definitions provided below include both the conceptual and representational
form of each Dublin Core Element Type and their Identifier (the
latter is used as the value of the name
attribute when embedded
within an XHTML <meta>
element). This is derived from the
DCMES 1.1 Recommendation [DCMES],
the DCMI Recommendation for encoding Dublin Core in XML ) [DCMES-XML], and the DCMI Recommendation for encoding qualified
Dublin Core metadata in HTML (and XHTML) [DCQ-HTML]. The Definition attribute captures the semantic concept and
the Datatype and Comment attributes capture the data representation.
Each Dublin Core definition refers to the resource being described. A resource is defined in [RFC2396] as "anything that has identity". For the purposes of Dublin Core metadata, a resource will typically be an information or service resource, but may be applied more broadly. In the case of this specification, how the metadata refers to a specific resource is as described in Section 5.2.
Title | |
---|---|
Identifier: | "DC.Title" |
Element Type: | <dc:title> |
Definition: | A name given to the resource. |
Comment: | Typically, a Title will be a name by which the resource is formally known. |
Creator | |
Identifier: | "DC.Creator" |
Element Type: | <dc:creator> |
Definition: | An entity primarily responsible for making the content of the resource. |
Comment: | Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity. |
Subject and Keywords | |
Identifier: | "DC.Subject" |
Element Type: | <dc:subject> |
Definition: | The topic of the content of the resource. |
Comment: | Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. |
Description | |
Identifier: | "DC.Description" |
Element Type: | <dc:description> |
Definition: | An account of the content of the resource. |
Comment: | Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. |
Publisher | |
Identifier: | "DC.Publisher" |
Element Type: | <dc:publisher> |
Definition: | An entity responsible for making the resource available |
Comment: | Examples of a Publisher include a person, an organisation, or a service. Typically, the name of a Publisher should be used to indicate the entity. |
Contributor | |
Identifier: | "DC.Contributor" |
Element Type: | <dc:contributor> |
Definition: | An entity responsible for making contributions to the content of the resource. |
Comment: | Examples of a Contributor include a person, an organisation, or a service. Typically, the name of a Contributor should be used to indicate the entity. |
Date | |
Identifier: | "DC.Date" |
Element Type: | <dc:date> |
Definition: | A date associated with an event in the life cycle of the resource. |
Comment: | Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. |
Resource Type | |
Identifier: | "DC.Type" |
Element Type: | <dc:type> |
Definition: | The nature or genre of the content of the resource. |
Comment: | Type includes terms describing general categories, functions,
genres, or aggregation levels for content. Recommended best practice is
to select a value from a controlled vocabulary (for example, the working
draft list of Dublin Core Types [DCT1]).
To describe the physical or digital manifestation of the resource, use
the <dc:format> element. |
Format | |
Identifier: | "DC.Format" |
Element Type: | <dc:format> |
Definition: | The physical or digital manifestation of the resource. |
Comment: | Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). |
Resource Identifier | |
Identifier: | "DC.Identifier" |
Element Type: | <dc:identifier> |
Definition: | An unambiguous reference to the resource within a given context. |
Comment: | Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN). |
Source | |
Identifier: | "DC.Source" |
Element Type: | <dc:source> |
Definition: | A Reference to a resource from which the present resource is derived. |
Comment: | The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. |
Language | |
Identifier: | "DC.Language" |
Element Type: | <dc:language> |
Definition: | A language of the intellectual content of the resource. |
Comment: | Recommended best practice for the values of the Language element is defined by RFC 1766 [RFC1766] which includes a two-letter Language Code (taken from the ISO 639 standard [ISO639]), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard [ISO3166]). For example, 'en' for English, 'fr' for French, or 'en-uk' for English used in the United Kingdom. |
Relation | |
Identifier: | "DC.Relation" |
Element Type: | <dc:relation> |
Definition: | A reference to a related resource. |
Comment: | Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. |
Coverage | |
Identifier: | "DC.Coverage" |
Element Type: | <dc:coverage> |
Definition: | The extent or scope of the content of the resource. |
Comment: | Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges. |
Rights Management | |
Identifier: | "DC.Rights" |
Element Type: | <dc:rights> |
Definition: | Information about rights held in and over the resource. |
Comment: | Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource. |
Dublin Core content may be further qualified by refinement or
encoding scheme. The former makes the meaning of the element more
specific, the latter identifies a scheme to assist in interpreting
the metadata content. DCMI tasked element-specific working groups with
identifying qualifiers for each DC element, as described in
Dublin Core Qualifiers
[DCMES-Qualifiers]. The refinements
typically maintain the meaning but narrow the scope (e.g., "Created" on
<dc:date>
), encoding schemes (e.g., "Dewey Decimal Classification" on
<dc:subject>
) typically indicate that the element content comes
from a controlled vocabulary (i.e., a schema). Content without an explicit
qualifier is considered "unqualified," but according to a Dumb-Down Principle,
a client should be able to ignore a qualifier and still be able to use the
description.
DCMI does not consider their qualifier list closed; they expect that both they and implementors will develop additional qualifiers for specific domains, noting that while such locally-used qualifiers may not be as interoperable as widely-understood ones, the description is still likely to be usable in cross-domain resource discovery.
Following are some examples of how to select a subject classification for your metadata using three of the common DC subject qualifiers, Library of Congress Subject Headings, Library of Congress Classification (LCC), and Dewey Decimal Classification (DDC). This example is meant to demonstrate that subject classification is not always a simple process, though it is hoped that more widespread use by non-librarians might spur development of some improved (and free) online classification services for lay people.
You can search the U.S. Library of Congress Catalog at Gateway to Library Catalogs (choose either Simple Search, Advanced Search, Left-Anchored Phrase Search, or try a different z39.50-based catalog from around the world).
I searched on the phrase "Harvester Ant" (using the Advanced or Simple Search), and located quite a few library records. I honed my search by checking “More on this record” for subject details, and located several records that seemed to match my subject. I was able to determine that the subject of "Harvester Ant" could either be indicated by the LC Call No. QL568 or Dewey 595.
You can browse the U.S. Library of Congress Classification Outline where you'll confront a list of 21 main classes of the LCC. Unfortunately, the next level of browsing forces you to download a PDF file. Since "Harvester Ants" seems to be in the realm of Science, I downloaded the "Q -- Science" file. By reading through the classifications, it was easy to locate the range of my subject:
Class Q Science Subclass QL Zoology QL461-599.82 Insects
Unfortunately, the finest grain available here indicates a range, not a specific value. The previous method (using LCSH) was able to determine that the LC Call Number is actually QL568, so this method at least confirmed the correct value.
You can browse the About Dewey Web page from Online Computer Library Center OCLC, navigating to the latest DDC Summaries page. By browsing the "First Summary", "Second Summary" and "Third Summary" (each with finer resolution), I was able to locate the subject of "Harvester Ants" as:
500 Science 590 Zoology 595 Arthropoda
There are a number of online DDC web pages that list the complete classification system. Note that DDC is versioned scheme, and that OCLC updates the finer resolution numbers periodically.
The DCMI Type Vocabulary provides a general, cross-domain list of approved terms
that may be used as content of the <dc:type>
element, or values
of the DC.Type
property (when expressed as <meta>
attribute content) to identify the genre of a resource.
The following are links to definitions from "DCMI Type Vocabulary", [DCMI-Types]:
The editors would like to thank those who have provided valuable feedback on this document, including [...]
Sincere thanks to the Arizona-Sonora Desert Museum for permission to use content from both their excellent web site and one of their print publications (which is invaluable in identifying holes in the desert):
This document includes metadata as described herein, and is a valid instance of the XHTML Augmented Metadata 1.0 DTD.
Following are some samples of embedded metadata, each using a different method of hiding or displaying its content. In the first tests, success is the absence of a display. Following these are several tests that display the metadata content. For syntax details, check the XHTML source of this specification.
Hiding Test 1:
Following this paragraph is a table element which contains a
<div>
element containing a DC metadata block.
The <div>
element's style property (associated
by class
attribute in this document's internal style
sheet) "display" has been set to "none":
: |
Hiding Test 2:
Following this paragraph is a DC metadata block inside of a single celled table.
Following a colon character, the <meta>
element's style property
(associated in the document's internal style sheet by element type) "display"
has been set to "none":
:
|
Hiding Test 3:
Following this paragraph is a single cell table containing a
<div>
element containing a DC metadata block.
Following a colon character, the <meta>
element has a style
attribute assigning a style
property "display:none":
: |
Hiding Test 4:
Following this paragraph is a single cell table. Following an initial colon
character there are four <meta>
elements. Since these
contain only attribute content, only the colon should be displayed:
: |
Display Test 5:
Following this paragraph is a table whose single cell contains a
DC metadata block. This test does not attempt to hide the metadata content,
instead including <br />
elements as line breaks,
and styling for the entire table cell via the document stylesheet.
: |
Display Test 6:
Following this paragraph is a single cell table containing a
<div>
element containing a DC metadata block.
This test does not attempt to hide the metadata content,
instead styling the entire block as "whitespace : pre" via the
document stylesheet.
: |
Display Test 7:
This test is basically the same as Test 5 except it attempts to attach
styling via an id
attribute on the <meta>
element itself. If this works, the font color should appear as red.
: |
It is noted that current versions of several browsers hide the
metadata content on tests 1 and 4, the former only when CSS is on.
The safest encoding method is to use attribute content on
<meta>
elements.