Notes
Slide Show
Outline
1
Meta Data Management
  • Philip A. Bernstein
  • Sergey Melnik
  • {philbe, melnik}@microsoft.com


  • Microsoft Research


  • Modified version of the seminar presented at ICDE, Boston, April 1, 2004 – for presentation at CSEP 544
  • © 2004 Microsoft Corporation. All rights reserved.
2
Meta Data Management
  • Meta data = structural information
    • DB schema, interface defn, web site map, form defns, …
3
Meta Data Problems
  • Data translation
  • Schema evolution
  • XML message translation
  • Application integration
  • Data warehouse loading
  • ER/UML design tools
  • Wrapper generation for SQL
  • UI / 4GL generation
  • Dependency tracking
  • Lineage tracing
  • Info resource mgmt
  • Binding, renaming
  • Software build (make)
  • Configuration mgmt


4
Outline
  • Introduction
  • Meta data problems
  • Design patterns
  • Solution templates
  • Wrap up
5
Why Meta Data is Important
  • Many DB problems are easier to solve by manipulating meta data
    • Instead of writing code
    • Instead of manipulating data directly
  • Meta-data-based solutions all involve models (schemas) and mappings
    • Mappings - data transformations, queries, dependencies, …
    • Model, manipulate, and generate them
    • Usually, generate code from them
6
Example: Object-Oriented Wrapper for SQL Tables
7
OO wrapper for SQL (cont’d)
8
Example – Data Translation
  • Translate data from one data model to another
  • Either write a program or generate it
9
Meta-data-Speak
10
Outline
  • Introduction
  • Meta data problems
  • Design patterns
  • Solution templates
  • Wrap up
11
Meta Data Solution Template
12
Meta Data Solution Template
13
Meta Data Solution Template
14
Meta Data Solution Template
15
Meta Data Solution Template
16
Meta Data Solution Template
17
Meta Data Solution Template
18
Example – Data Translation
19
Example – Data Translation
20
Example – Data Translation
21
 
22
Meta Data Problems
  • Data translation
  • OO or XML wrapper generation for SQL DB
  • User-Interface / 4GL-program generation
  • Design tool support (DB, UML, …)
    • Model generation, reverse engineering
    • Round-trip engineering
  • Schema evolution (applies to all scenarios)
  • XML message translation for e-commerce
  • Integrate custom apps with commercial apps
23
Meta Data Problems (cont’d)
  • Data warehouse loading (clean & transform)
  • Lineage tracing (provenance)
  • Information resource management
  • Dependency tracking
    • Impact analysis
    • Navigation between tools
  • Binding, renaming
  • Software build (make)
  • Version and configuration management
    • Release management
    • Product data management
24
Meta Data Solutions
  • They strongly resemble one another
  • We characterize that resemblance
    • Prototypical problems, or design patterns
    • Solution specifications, or solution templates
    • Primitive solution steps, or operators
  • Goals
    • A methodology to solve meta data problems
    • Ultimately, operator implementations to turn solution templates into solution programs
25
Outline
  • Introduction
  • Meta data problems
  • Design patterns
  • Solution templates
  • Wrap up
26
Meta Data Design Patterns
  • Design pattern – a problem description consisting of
    • Input models and mappings
    • Output models and mappings
    • Criteria for the output to be correct
    • An application specializes it to meta models and mapping languages
  • Solution template – a sequence of operators producing the desired output
  • Operators – a single step that computes a model and/or mappings
27
Operators
  • map = Match (M1, M2)
    • Return a mapping between the two models
  • áM2, map12ñ = ModelGen(M1, metamodel2)
    • Return a model M2 that is expressed in metamodel2 and is equivalent to model M1
  • áM3, map13, map23ñ = Merge (M1, M2, map)
    • Return the union of models M1 and M2
  • map3 = Compose(map1, map2) = map1 ○ map2
    • Return the composition of map1 and map2, which is a mapping from map1’s domain to map2’s range.
28
Operators (cont’d)
  • map3 = Confluence(map1, map2) = map1 Å map2
    • Return the “merge” of mappings map1 and map2
  • áM2, map12ñ = Extract(M1, map)
    • Return the sub-model of M1 that participates in the mapping map
  • áM2, map12ñ = Diff(M1, map)
    • Return the sub-model of M1 that does not participate in the mapping map
29
Design Patterns
  • Meta Modeling
  • Model Mapping
  • Model Generation
  • Model Integration
  • Mapping Composition
  • Mapping Alignment
  • Change Propagation
  • Model Reintegration
30
Meta Modeling
  • Design pattern –  develop a representation (i.e. metamodel) for models and mappings
  • Applications – they all depend on this
  • Solution template
    • Design a metamodel
    • Write Import & Export functions
      • ImportSQL, ImportXSD, ImportERD, …
    • Today, it is manual engineering design
    • Design once and reuse often
31
Meta Modeling (cont’d)
  • The Import function for models
    • Parse text
    • Copy elements of the parsed form into a model that conforms to its metamodel
  • The Import function for mappings
    • Same as models but may require more semantic analysis
    • E.g., program dependencies, data lineage
    • For some languages and mapping metamodels, Export is hard (e.g., XSLT)
32
Model Mapping
  • Design pattern – Design a mapping between two models and generate code from it
33
An XML Mapping Tool
34
A Data Warehouse Loading Tool
35
Model Generation
  • Design pattern – Given a model, generate an equivalent model in another metamodel
36
Model Integration
37
Mapping Composition
  • Design pattern – Compose two given mappings
38
Mapping Alignment
  • Design pattern – Align two mappings between the same pair of models
39
Model Reintegration
40
Change Propagation
41
Outline
  • Introduction
  • Meta data problems
  • Design patterns
  • Solution templates
    • Change propagation
    • Model reintegration
    • Change propagation revisited
  • Research background
  • Wrap up
42
Change Propagation
43
Change Propagation
44
Change Propagation
45
Change Propagation
46
Change Propagation (cont’d)
47
Change Propagation (cont’d)
48
Change Propagation (cont’d)
49
Complete Script in Rondo
50
Model reintegration
  • Design pattern
    • Reconcile independent changes
    • All changes of each model
    • No “duplicate additions”
  • Simplified example
    • “Additions” = add model element
         (also: drop constraints, reorg. model)
    • “Deletions” = delete model element
         (also: add constraints, reorg. model)
    • Mappings shown as lines betw. elements


51
 
52
"Direct model integration “loses"
  • Direct model integration “loses” either deletions or additions
  • Need m, m_mA, m_mB
53
"Propagate deletions before integrating mA..."
  • Propagate deletions before integrating mA and mB
54
"Composition produces a partial mapping"
  • Composition produces a partial mapping
55
"Identify
additions
(Diff)"
  • Identify
    additions
    (Diff)
56
"Identify
additions
(Diff)"
  • Identify
    additions
    (Diff)
  • Match mAx and mBx
57
"Identify
additions
(Diff)"
  • Identify
    additions
    (Diff)
  • Match mAx and mBx
58
"Identify
additions
(Diff)"
  • Identify
    additions
    (Diff)
  • Match mAx and mBx
59
 
60
"Merge mA¢"
  • Merge mA¢ and mB¢ using mA¢_mB¢


61
"Merge mA¢"
  • Merge mA¢ and mB¢ using mA¢_mB¢


62
Solution script
63
 
64
 
65
 
66
 
67
 
68
Solution script
69
Outline
  • Introduction
  • Meta data problems
  • Design patterns
  • Solution templates
    • Change propagation
    • Model reintegration
    • Change propagation revisited
  • Research background
  • Wrap up
70
Change propagation
  • Solution template
    • Propagate deletions
    • Include additions
    • Merge result
71
Change propagation
72
First-cut taxonomy of patterns
73
Outline
  • Introduction
  • Meta data problems
  • Design patterns
  • Solution templates
  • Research background
  • Wrap up
74
The Commercial World
  • Books for IT professionals
    • A. Tanenbaum: Metadata Solutions, Addison-Wesley, 2001
    • D. Marco: Building and Managing the Meta Data Repository, Wiley, 2000
  • Standards-
    • UML, MOF, CWM (OMG)
    • XML, RDF, XML Schema, OWL (W3C)
  • Products and tools
    • Modeling: IBM Rational Rose, Visio, CA AllFusion, Borland Together
    • General meta data managers: CA Advantage, Microsoft Meta Data Services, MetaIntegration
    • Meta data services in data warehousing ETL tools: Informatica, Ascential, ETI, Data Advantage, …
75
The Research World
  • Model Management
    • A computational meta data framework based on models, mappings, and the operators described here (Match, Merge, Compose, …)
  • Meta Data is a very active research area
    • Papers coming from many DB research groups
    • Some are problem-focused (e.g. data integration)
    • Some are operator-focused (e.g. Match, Merge)
76
MM System Architecture
77
Summary
  • Many DB problems are easier to solve by manipulating meta data
  • Meta data problems and solutions strongly resemble one another
  • Methodology: Use design patterns, solution templates, and operators to simplify development of meta data applications
  • There is much research to be done


78
References
  • http://research.microsoft.com/db/ModelMgt
  • Overview
    • Bernstein, CIDR 2003
    • Bernstein, Halevy, Pottinger, SIGMOD Record, Dec. 2000
  • Implementation
    • Melnik, Rahm, & Bernstein, SIGMOD 2003
      and J. Web Semantics 1, 2003
  • Data Warehouse Examples
    • Bernstein & Rahm, ER 2000
  • Match Operation
    • Survey: Rahm & Bernstein , VLDB J., Dec. 2001
  • Merge Operation
    • Pottinger & Bernstein, VLDB 2003
79
 
80
The Match “Operator”
  • Schema matching (mapping discovery)
    • Given two schemas, return correspondences that specify pairs of related elements
  • Semantic Mapping (query discovery)
    • Given correspondences between two schemas, return an expression that translates instances of one schema into instances of the other.
81
Schema Matching Problem
  • Input
    • Schemas S1 and S2
    • Possibly data instances for S1 and S2
    • Background knowledge – thesauri, validated matches, standard schemas, constraints (keys, data types), ontologies, NL glossaries, etc.
  • Output
    • Correspondences between elements of
      S1 and S2



82
Schema Matching Approaches
  • Many good ideas
    • Rahm & Bernstein, VLDB J, Dec ’01
83
The Cupid Algorithm
  • Computes linguistic similarity of element pairs
  • Computes structural similarity of element pairs
  • Generates a mapping
84
Matching Anatomy Ontologies
  • Match two human anatomy ontologies
    • FMA – Univ. of Washington
    • Galen CRM – Univ. of Manchester (UK)
    • By Peter Mork (Univ. of Washington)
    • Both models are big
  • Ultimate goal was finding differences
  • Like most match algorithms, ours calculates a similarity score for the
    m´n pairs of elements
85
Aligning Representations
  • FMA:
  • CRM:
  • Heart sensibly
    hasStructuralComponent
    ValveInHeart
86
Anatomy Matching Algorithm
  • Lexical Match
    • Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus
87
Anatomy Matching Algorithm
  • Lexical Match
    • Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus


  • String comparison ® 306 matches
  • Adding spaces, ignoring case ® 1834 matches
  • Lexical tools ® 3503 matches
88
Anatomy Matching Example
89
Anatomy Matching Algorithm
  • Lexical Match
    • Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus
  • Structure Match
    • Similarity(reified nodes)
                  = Average(neighbors)
    • Back-propagate to neighbors
90
Anatomy Matching Example
91
Anatomy Matching Algorithm
  • Lexical Match
    • Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus
  • Structure Match
    • Similarity(reified nodes)
                  = Average(neighbors)
    • Back-propagate to neighbors


  • Adds 64 matches (to previous 3503)
    • Implies 875 reified relationship matches
92
Anatomy Matching Algorithm
  • Lexical Match
    • Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus
  • Structure Match
    • Similarity(reified nodes)
                  = Average(neighbors)
    • Back-propagate to neighbors
  • Align Super-classes
    • Super-class similarity = average similarity of children, grandchildren, great-grandchildren
  • Adds 213 matches (to 3567)
93
Some Lessons
  • A common encoding of models is hard and involves compromises
    • Different styles of reifying relationships
    • CRM stores transitive relationships
  • Match needs to invent generalizations
    • In FMA, arterial supply, venous drainage, nerve supply, lymphatic drainage
    • In CRM, these all map to isServedBy
  • On big models, Match is expensive
    • Some steps required days to execute
    • Cross-product filled 80 GB (< 1GB input).
94
Outline
  • Introduction to Model Management
  • Using MM to solve meta data problems
  • Matching anatomy ontologies
  • Model merging
  • Wrap-up
95
Merge(M1, M2, map)
  • Return the union of models M1 and M2
    • Use map to guide the Merge
    • If elements x = y in map, then collapse them into one element
96
Merge(M1, M2, map)
  • [Buneman, Davidson, Kosky, EDBT 92]
    • Meta-model has aggregation & generalization only
    • Union, and collapse objects having the same name
    • Fix-up step for inconsistencies created by merging
97
Resolving Merge Conflicts
98
Contributions to Merge
[Pottinger & Bernstein, VLDB 03]
  • Generic correctness criteria for Merge
  • Use of first-class input mapping (not just correspondences)
  • Taxonomy of conflicts & resolution strategies
  • Characterize when Merge can be automatic
  • A merge algorithm for an EER representation
  • Experimental evaluation
99