Does Any Of That Stuff Mean Anything?

Before we begin, stick your nose into some 20th century, Post-Modern, Post-Structuralist, French literary criticism...

Jacques Derrida, 1930- French philosopher. He argues that language only refers to other language, therefore negating the idea of a single, valid “meaning” of a text as intended by the author. Rather, the author’s intentions are subverted by the free play of language, giving rise to many meanings the author never intended.
The Columbia encyclopedia, 6th ed., 2001.     Did you survive that? Feeling ok?


What Are We Trying To Do Here?   Ask a Librarian!

What we trying to do is name things (and thereby give them a meaning). You do that when you construct XML documents, when you build databases, etc., and you name the XML element or the attribute of a relation, etc. It's what librarians are famous for. They've been doing that for about 100 years with the Dewey Decimal system and the Library of Congress Subject Heading list

 

New Library of Congress Subject Headings and their corresponding Dewey Decimal Numbers
Date: July 18, 2003

 


 

So Let Allyce Do Her Thing And Give It A Name!

See, the problem is, which name. Many words can mean the same thing, and a single word can mean a lot of things.

A short visit with a key text of information science...

An important psychological problem is in understanding the relationship between what people say and what they want. This understanding is the key to designing systems that can infer what services or information users need from the input they provide.

We have asked people to give descriptions of various information objects, and analyzed their responses to determine how well the objects to which they refer can be inferred from what they say.

Obviously, one of the main difficulties in predicting an intended object from a provided word is synonymy. There are many different words that can refer to the same object. Even though the receiver may know several of them, the communicant (or user) may choose another.

Another part of the problem is polysemy; each word means many different things and can refer to many different objects. In our observed data, words that were frequently used tended to be applied to several objects.

Statistical semantics: Analysis of the potential performance of key-word information systems by G.W. Furnas, et. al. The Bell System Technical Journal, v.62(6) July-August 1983

 

Predictably, Allyce And Her Librarian Friends Will Probably Disagree About The Meaning

"About fifteen years ago, some evidence was brought to the attention of the field which indicated that, if several different indexers are all asked to index the same document, a great deal of inconsistency is likely to be apparent in the results. That is to say, different indexers are apt to assign quite different sets of index terms (i.e., descriptors, subject headings) to the same document. This evidence must have been received with considerable skepticism by those who believed that there is only one 'right' way to index a document and that any properly trained indexer has a pretty good idea of what that 'right' way is. Since then, however, the issue has received a great deal of attention, and the original findings have been amply corroborated by a number of independent tests. It seems that a substantial amount of interindexer inconsistency, as the phenomenon of conflicting indexer decisions has come to be called, is the rule rather than the exception."

William S. Cooper, "Is Interindexer Consistency A Hobgoblin?" American Documentation, July 1969, 268-278

 

Even though the basic problems of language and meaning observed in paper technologies and the work of librarians were never solved, information technology sped forward.

If you suppose, then, that we shall see these basic problems re-appear, but now manifested as "IT problems" or "Web problems" and maybe given new, whiz-bang names like "metadata," you win the prize for being prescient, far sighted, forward looking, insightful (Wait! Those are synonyms! Synonymy is one of the problems!)


Feeling better after our historical interlude?

 

What Is Metadata?

Jargon? A subject heading used on the web. Jargon? Information about information? Jargon? The key building block of the Semantic Web. Jargon? The foundation to interoperability. Jargon? How a web author indicates the contents of a web page. Jargon? What a librarian calls a subject heading, a webster calls metadata. Jargon? What a relational databaser calls attributes, a webster calls metadata. Jargon? A name for something. Jargon? A description of something. Jargon? Just about anything you want. Jargon? A piece of inflated rhetoric to intimidate the uninitiated. Oh, that's about right!


 

Where Does Metadata Live?

The HTML 4.01 Specification (W3C Recommendation 24 December 1999) notes that

"HTML lets authors specify meta data -- information about a document rather than document content -- in a variety of ways"

Example <HEAD> section with properties

<HEAD profile="http://www.acme.com/profiles/core">
  <TITLE>How to complete Memorandum cover sheets</TITLE>
  <META name="author" content="John Doe">
  <META name="copyright" content="© 1997 Acme Corp.">
  <META name="keywords" content="corporate,guidelines,cataloging">
  <META name="date" content="1994-11-06T08:49:37+00:00">
</HEAD>

"A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the lang attribute to display search results using the language preferences of the user. For example:"

<-- For speakers of US English -->
<META name="keywords" lang="en-us" 
         content="vacation, Greece, sunshine">
<-- For speakers of British English -->
<META name="keywords" lang="en" 
         content="holiday, Greece, sunshine">
<-- For speakers of French -->
<META name="keywords" lang="fr" 
         content="vacances, Grèce, soleil">	
		

Ouch! Metadata on the Web

"The tygers of wrath are wiser than the horses of instruction."   William Blake

It appears to be a matter of belief. You are a person who believes that people all over the world will play nicely together, or you're a *&%#!@ cynic. You believe that people all over the world will cooperate in using metadata wisely or you're some kind of a +&^$#@~ nihilist.


A Statement of Belief:  "Metadata is a key part of the information infrastructure necessary to help create order in the chaos of the Web, infusing description, classification, and organization to help create more useful stores of information." Metadata Principles and Practicalities by Erik Duval et al.

The Tygers of Wrath Teach Us   "Experience with the tag [meta keywords tag] has showed it to be a spam magnet. Some web site owners would insert misleading words about their pages or use excessive repetition of words in hopes of tricking the crawlers about relevancy." Danny Sullivan


Something to read: Death of a meta tag by Danny Sullivan, editor of SearchEngineWatch

Something to read: Metacrap by Cory Doctorow

 


Here's metadata on the web ...

What is Dublin Core?

"The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources. The Dublin Core standard comprises fifteen elements, the semantics of which have been established through consensus by an international, cross-disciplinary group of professions from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship." Diane Hillmann

An example:

	<meta name = "DC.Subject"
		scheme = "MESH"
		content = "myocardial Infarction; Pericardial Effusion">
	<meta name = "DC.Creator"
		content = "Gogh, Vincent van">
	
The tiger bites...

"A discouraging aspect of metadata usage trends on the public Web over the last five years is the seeming reluctance of content creators to adopt formal metadata schemes with which to describe their documents. For example, Dublin Core metadata appeared on only 0.5 percent of public Web site home pages in 1998; that figure increased almost imperceptibly to 0.7 percent in 2002. The vast majority of metadata provided on the public Web is ad hoc in its creation, unstructured by any formal metadata scheme." Trends in the Evolution of the Public Web, 1998 - 2002 by Edward T. O'Neill, et al


 


More metadata on the web ...

What is RDF?

"The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource."

"RDF is based on the idea of identifying things using Web identifiers (URIs), and describing resources in terms of simple properties and property values. To make this discussion somewhat more concrete as soon as possible, the group of statements "there is someone whose name is Eric Miller, whose email address is em@w3.org, and whose title is Dr." could be represented as below": RDF Primer

The 'Eric Miller' example.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">

  <contact:Person rdf:about="http://www.w3.org/People/EM/contact#me">
    <contact:fullName>Eric Miller</contact:fullName>
    <contact:mailbox rdf:resource="mailto:em@w3.org"/>
    <contact:personalTitle>Dr.</contact:personalTitle> 
  </contact:Person>

</rdf:RDF>

 

The tiger bites again...

"Since the initial experiments indicate that RDF data is hard to find, a more targeted search was conducted.
During the first experiment, RDF data was found in only sixteen out of over half a million pages from the Open Directory. This number increased to 180 out of 2.9 million pages in the second run.
Overall, with the categories combined, this translates to 1018 out of 541,536 URLs containing RDF, 613 of them correct and 405 with incorrect RDF for the first run. In the second run, out of 2,952,010 pages, 1479 contained valid and 2940 contained invalid RDF.
The results of this survey suggest that RDF has not caught on with a large user community."   Survey of RDF Data on the Web by Andreas Eberhart, August 15, 2002

 


Lots more metadata on the web ...

What is OIL + DAML?

  • DAML - DARPA Agent Markup Language
  • OIL - Ontology Inference Layer
What's An Ontology?
"An ontology is a specification of a conceptualization."

"Ontologies are often equated with taxonomic hierarchies of classes, class definitions, and the subsumption relation, but ontologies need not be limited to these forms."   Tom Gruber

Quack! We're in deep water here. Librarians used to call these hierarchies of terms: Broad terms like "mammal" then narrower terms like "dog" and even narrower terms like "Collie". But ontologies are sort of like hierarchies, but not exactly like hierarchies. Quack! Quack! So what are they really like? Quack! Ask Allyce, she's a librarian! Quack!

"The use of ontologies provides a very powerful way to describe objects and their relationships to other objects. The DAML language is being developed as an extension to XML and the Resource Description Framework (RDF). The latest release of the language (DAML+OIL) provides a rich set of constructs with which to create ontologies and to markup information so that it is machine readable and understandable. " About the DAML Language

Something to notice:

Did you see the description above? "machine readable" ... that means we've moved away from categorizing things for human beings and are designing metadata for machines to (pardon the jargon, but this is what they say) harvest.

Think about the economics of time and energy. This stuff is time- and labor-expersive. Too expensive for just any HTML page. If you're going to mark stuff up with DAML+OIL, it would only make economic sense if the stuff was going to hang around for a long time.

Can you say digital library? What about web service?

Example

There are two types of animals, Male and Female.

<daml:Class rdf:ID="Male">
  <rdfs:subClassOf rdf:resource="#Animal"/>
</daml:Class>

It perfectly admissible for a class to have multiple superclasses: A Man is a Male Person

<daml:Class rdf:ID="Man">
  <rdfs:subClassOf rdf:resource="#Person"/>
  <rdfs:subClassOf rdf:resource="#Male"/>
</daml:Class>
	
Is there anything for the tiger to bite?

Terry Brooks says DAML + OIL is so leading edge and so complex a technology that it will probably never penetrate widely into the open, common web. There may be some large demonstation projects that are created to show proof of concept, but somebody will have to hold Terry's hand when he DAMLizies his web site.

The DAML community is happy just to get some attention: DAML.ORG has had over ten million hits as of Friday, 28 March, 2003. "The very large amount of activity for this web site reflects the significant interest around the world in DAML technology as it supports the emerging Semantic Web" HotDAML newsletter

 


Lots and lots more metadata on the web ...

What is OWL?

"The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine readability of Web content than that supported by XML, RDF, and RDF Schema by providing additional vocabulary along with a formal semantics."

"The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language what can formally describe the meaning of terminology used in Web documents." OWL Web Ontology Language Overview

 

Somebody call Redmond, Washington!

Now semantics on the web has shifted away from humans to presenting information for machine comsumption. It's a revolution that has already happened. It's called web services.

You can use Microsoft's VS.NET to write a web service so that your computer application can talk to my computer application.