Crossing the Structure Chasm

Online information comes in two flavors: unstructured text and structured data. Unstructured text is easy to author and keyword query interfaces to it are ubiquitous. However, keyword queries are limited and the answers obtained to them are approximate. Structured data, usually managed by databases and knowledge bases, is substantially harder to create and query. Except for the conceptual effort of creating and querying a schema, structured data requires substantial up front effort by trained professionals of devising a comprehensive structure before entering any data. The payoff is at query time: one can pose very complex queries and obtain precise answers.

With the advent of modern networks, a growing number of applications and grand challenges for our field require large-scale sharing of structured data. Unlike traditional database applications, data needs to be authored by non-database savvy individuals in a decentralized fashion. Furthermore, some minimal structuring of data can also greatly benefit tasks we face on a daily basis, such as email and personal information management.

This talk will describe the beginnings of a journey whose goal is crossing the structure chasm, that is, introducing technology that imports some of the attractive properties unstructured-data management into the world of structured data so it can be accessible to a much larger audience. I will discuss some recent ideas whose goal is to facilitate large-scale sharing of structured data, and to support our daily data management tasks.