‹header›
‹date/time›
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
‹footer›
‹#›
12
14
This is both a background slide and a running example.  Need to observe the following:
* Web, newsgroup, local, WAN, LAN sources
* Sometimes we have complete knowledge; other times we don’t
* Sometimes data sources overlap
Shown query is courtesy of Hao Mei.
For all phenotypes in one database, find all gene, locus, product triples (GeneTests’ curator query). For a given genetic disease, find all other diseases caused by the same gene (Joyce Mitchell, National Library of Medicine).
For a set of genes and proteins, find all matches (Marianne Barrier).
For some gene, find all homologues in other species (me).
Notes: Phenotype can be normal (as depicted) or abnormal (gross).  Gene is misrepresented as a more concrete chromosome.  Vocabulary picture stolen from Unified Medical Language System.
Here is a example of a design experiment..
Design schemas for managing the inventory of a store.
tower of babel
This picture probably could use a bit of enhancement, once I can get an editable copy of the file
DPJoin picture
Cite Wilschut 91 PDIS
Mention that we use multithreading to achieve same effect
Here is a example of a design experiment..
Design schemas for managing the inventory of a store.
tower of babel
Summary of most current techniques to do schema matching
Combine multiple sources of evidences.
Each one is noisy.
Points:
1) Introduce our approach.
2) We do not manually map the schemas of all sources to mediated schema.
The goal is to manually mark up only a few sources, and be able to learn from the marked up sources to successfully propose mappings for subsequent sources. 3) Once the markup is done,  there are many different types of information to learn from.