Typical Matching Heuristics
•We build a model for every element from multiple sources of evidences in the schemas
–Schema element names
•BooksAndCDs/Categories ~ BookCategories/Category
–Descriptions and documentation
•ItemID: unique identifier for a book or a CD
•ISBN: unique identifier for any book
–Data types, data instances
•DateTime ¹ Integer,
•addresses have similar formats
–Schema structure
•All books have similar attributes
•
Models consider only the two schemas.
In isolation, techniques are incomplete or brittle:
Need principled combination.
Summary of most current techniques to do schema matching
Combine multiple sources of evidences.
Each one is noisy.