|
1
|
- Philip A. Bernstein
- Sergey Melnik
- {philbe, melnik}@microsoft.com
- Microsoft Research
- Modified version of the seminar presented at ICDE, Boston, April 1, 2004
– for presentation at CSEP 544
- © 2004 Microsoft Corporation. All rights reserved.
|
|
2
|
- Meta data = structural information
- DB schema, interface defn, web site map, form defns, …
|
|
3
|
- Data translation
- Schema evolution
- XML message translation
- Application integration
- Data warehouse loading
- ER/UML design tools
- Wrapper generation for SQL
- UI / 4GL generation
- Dependency tracking
- Lineage tracing
- Info resource mgmt
- Binding, renaming
- Software build (make)
- Configuration mgmt
|
|
4
|
- Introduction
- Meta data problems
- Design patterns
- Solution templates
- Wrap up
|
|
5
|
- Many DB problems are easier to solve by manipulating meta data
- Instead of writing code
- Instead of manipulating data directly
- Meta-data-based solutions all involve models (schemas) and mappings
- Mappings - data transformations, queries, dependencies, …
- Model, manipulate, and generate them
- Usually, generate code from them
|
|
6
|
|
|
7
|
|
|
8
|
- Translate data from one data model to another
- Either write a program or generate it
|
|
9
|
|
|
10
|
- Introduction
- Meta data problems
- Design patterns
- Solution templates
- Wrap up
|
|
11
|
|
|
12
|
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
|
|
22
|
- Data translation
- OO or XML wrapper generation for SQL DB
- User-Interface / 4GL-program generation
- Design tool support (DB, UML, …)
- Model generation, reverse engineering
- Round-trip engineering
- Schema evolution (applies to all scenarios)
- XML message translation for e-commerce
- Integrate custom apps with commercial apps
|
|
23
|
- Data warehouse loading (clean & transform)
- Lineage tracing (provenance)
- Information resource management
- Dependency tracking
- Impact analysis
- Navigation between tools
- Binding, renaming
- Software build (make)
- Version and configuration management
- Release management
- Product data management
|
|
24
|
- They strongly resemble one another
- We characterize that resemblance
- Prototypical problems, or design patterns
- Solution specifications, or solution templates
- Primitive solution steps, or operators
- Goals
- A methodology to solve meta data problems
- Ultimately, operator implementations to turn solution templates into
solution programs
|
|
25
|
- Introduction
- Meta data problems
- Design patterns
- Solution templates
- Wrap up
|
|
26
|
- Design pattern – a problem description consisting of
- Input models and mappings
- Output models and mappings
- Criteria for the output to be correct
- An application specializes it to meta models and mapping languages
- Solution template – a sequence of operators producing the desired output
- Operators – a single step that computes a model and/or mappings
|
|
27
|
- map = Match (M1, M2)
- Return a mapping between the two models
- áM2, map12ñ = ModelGen(M1,
metamodel2)
- Return a model M2 that is expressed in metamodel2
and is equivalent to model M1
- áM3, map13,
map23ñ = Merge
(M1, M2, map)
- Return the union of models M1 and M2
- map3 = Compose(map1, map2) = map1
○ map2
- Return the composition of map1 and map2, which is
a mapping from map1’s domain to map2’s range.
|
|
28
|
- map3 = Confluence(map1, map2) = map1
Å map2
- Return the “merge” of mappings map1 and map2
- áM2, map12ñ = Extract(M1, map)
- Return the sub-model of M1 that participates in the mapping map
- áM2, map12ñ = Diff(M1, map)
- Return the sub-model of M1 that does not participate in the
mapping map
|
|
29
|
- Meta Modeling
- Model Mapping
- Model Generation
- Model Integration
- Mapping Composition
- Mapping Alignment
- Change Propagation
- Model Reintegration
|
|
30
|
- Design pattern – develop a
representation (i.e. metamodel) for models and mappings
- Applications – they all depend on this
- Solution template
- Design a metamodel
- Write Import & Export functions
- ImportSQL, ImportXSD, ImportERD, …
- Today, it is manual engineering design
- Design once and reuse often
|
|
31
|
- The Import function for models
- Parse text
- Copy elements of the parsed form into a model that conforms to its
metamodel
- The Import function for mappings
- Same as models but may require more semantic analysis
- E.g., program dependencies, data lineage
- For some languages and mapping metamodels, Export is hard (e.g., XSLT)
|
|
32
|
- Design pattern – Design a mapping between two models and generate code
from it
|
|
33
|
|
|
34
|
|
|
35
|
- Design pattern – Given a model, generate an equivalent model in another
metamodel
|
|
36
|
|
|
37
|
- Design pattern – Compose two given mappings
|
|
38
|
- Design pattern – Align two mappings between the same pair of models
|
|
39
|
|
|
40
|
|
|
41
|
- Introduction
- Meta data problems
- Design patterns
- Solution templates
- Change propagation
- Model reintegration
- Change propagation revisited
- Research background
- Wrap up
|
|
42
|
|
|
43
|
|
|
44
|
|
|
45
|
|
|
46
|
|
|
47
|
|
|
48
|
|
|
49
|
|
|
50
|
- Design pattern
- Reconcile independent changes
- All changes of each model
- No “duplicate additions”
- Simplified example
- “Additions” = add model element
(also: drop
constraints, reorg. model)
- “Deletions” = delete model element
(also: add
constraints, reorg. model)
- Mappings shown as lines betw. elements
|
|
51
|
|
|
52
|
- Direct model integration “loses” either deletions or additions
- Need m, m_mA, m_mB
|
|
53
|
- Propagate deletions before integrating mA and mB
|
|
54
|
- Composition produces a partial mapping
|
|
55
|
- Identify
additions
(Diff)
|
|
56
|
- Identify
additions
(Diff)
- Match mAx and mBx
|
|
57
|
- Identify
additions
(Diff)
- Match mAx and mBx
|
|
58
|
- Identify
additions
(Diff)
- Match mAx and mBx
|
|
59
|
|
|
60
|
- Merge mA¢ and mB¢ using mA¢_mB¢
|
|
61
|
- Merge mA¢ and mB¢ using mA¢_mB¢
|
|
62
|
|
|
63
|
|
|
64
|
|
|
65
|
|
|
66
|
|
|
67
|
|
|
68
|
|
|
69
|
- Introduction
- Meta data problems
- Design patterns
- Solution templates
- Change propagation
- Model reintegration
- Change propagation revisited
- Research background
- Wrap up
|
|
70
|
- Solution template
- Propagate deletions
- Include additions
- Merge result
|
|
71
|
|
|
72
|
|
|
73
|
- Introduction
- Meta data problems
- Design patterns
- Solution templates
- Research background
- Wrap up
|
|
74
|
- Books for IT professionals
- A. Tanenbaum: Metadata Solutions, Addison-Wesley, 2001
- D. Marco: Building and Managing the Meta Data Repository, Wiley, 2000
- Standards-
- UML, MOF, CWM (OMG)
- XML, RDF, XML Schema, OWL (W3C)
- Products and tools
- Modeling: IBM Rational Rose, Visio, CA AllFusion, Borland Together
- General meta data managers: CA Advantage, Microsoft Meta Data Services,
MetaIntegration
- Meta data services in data warehousing ETL tools: Informatica,
Ascential, ETI, Data Advantage, …
|
|
75
|
- Model Management
- A computational meta data framework based on models, mappings, and the
operators described here (Match, Merge, Compose, …)
- Meta Data is a very active research area
- Papers coming from many DB research groups
- Some are problem-focused (e.g. data integration)
- Some are operator-focused (e.g. Match, Merge)
|
|
76
|
|
|
77
|
- Many DB problems are easier to solve by manipulating meta data
- Meta data problems and solutions strongly resemble one another
- Methodology: Use design patterns, solution templates, and operators to
simplify development of meta data applications
- There is much research to be done
|
|
78
|
- http://research.microsoft.com/db/ModelMgt
- Overview
- Bernstein, CIDR 2003
- Bernstein, Halevy, Pottinger, SIGMOD Record, Dec. 2000
- Implementation
- Melnik, Rahm, & Bernstein, SIGMOD 2003
and J. Web Semantics 1, 2003
- Data Warehouse Examples
- Bernstein & Rahm, ER 2000
- Match Operation
- Survey: Rahm & Bernstein , VLDB J., Dec. 2001
- Merge Operation
- Pottinger & Bernstein, VLDB 2003
|
|
79
|
|
|
80
|
- Schema matching (mapping discovery)
- Given two schemas, return correspondences that specify pairs of related
elements
- Semantic Mapping (query discovery)
- Given correspondences between two schemas, return an expression that
translates instances of one schema into instances of the other.
|
|
81
|
- Input
- Schemas S1 and S2
- Possibly data instances for S1 and S2
- Background knowledge – thesauri, validated matches, standard schemas,
constraints (keys, data types), ontologies, NL glossaries, etc.
- Output
- Correspondences between elements of
S1 and S2
|
|
82
|
- Many good ideas
- Rahm & Bernstein, VLDB J, Dec ’01
|
|
83
|
- Computes linguistic similarity of element pairs
- Computes structural similarity of element pairs
- Generates a mapping
|
|
84
|
- Match two human anatomy ontologies
- FMA – Univ. of Washington
- Galen CRM – Univ. of Manchester (UK)
- By Peter Mork (Univ. of Washington)
- Both models are big
- Ultimate goal was finding differences
- Like most match algorithms, ours calculates a similarity score for the
m´n pairs of elements
|
|
85
|
- FMA:
- CRM:
- Heart sensibly
hasStructuralComponent
ValveInHeart
|
|
86
|
- Lexical Match
- Normalize string, UMLS dictionary lookup, convert to concept-ID from
thesaurus
|
|
87
|
- Lexical Match
- Normalize string, UMLS dictionary lookup, convert to concept-ID from
thesaurus
- String comparison ® 306
matches
- Adding spaces, ignoring case ®
1834 matches
- Lexical tools ® 3503 matches
|
|
88
|
|
|
89
|
- Lexical Match
- Normalize string, UMLS dictionary lookup, convert to concept-ID from
thesaurus
- Structure Match
- Similarity(reified nodes)
=
Average(neighbors)
- Back-propagate to neighbors
|
|
90
|
|
|
91
|
- Lexical Match
- Normalize string, UMLS dictionary lookup, convert to concept-ID from
thesaurus
- Structure Match
- Similarity(reified nodes)
=
Average(neighbors)
- Back-propagate to neighbors
- Adds 64 matches (to previous 3503)
- Implies 875 reified relationship matches
|
|
92
|
- Lexical Match
- Normalize string, UMLS dictionary lookup, convert to concept-ID from
thesaurus
- Structure Match
- Similarity(reified nodes)
=
Average(neighbors)
- Back-propagate to neighbors
- Align Super-classes
- Super-class similarity = average similarity of children, grandchildren,
great-grandchildren
- Adds 213 matches (to 3567)
|
|
93
|
- A common encoding of models is hard and involves compromises
- Different styles of reifying relationships
- CRM stores transitive relationships
- Match needs to invent generalizations
- In FMA, arterial supply, venous drainage, nerve supply, lymphatic
drainage
- In CRM, these all map to isServedBy
- On big models, Match is expensive
- Some steps required days to execute
- Cross-product filled 80 GB (< 1GB input).
|
|
94
|
- Introduction to Model Management
- Using MM to solve meta data problems
- Matching anatomy ontologies
- Model merging
- Wrap-up
|
|
95
|
- Return the union of models M1 and M2
- Use map to guide the Merge
- If elements x = y in map, then collapse them into one element
|
|
96
|
- [Buneman, Davidson, Kosky, EDBT 92]
- Meta-model has aggregation & generalization only
- Union, and collapse objects having the same name
- Fix-up step for inconsistencies created by merging
|
|
97
|
|
|
98
|
- Generic correctness criteria for Merge
- Use of first-class input mapping (not just correspondences)
- Taxonomy of conflicts & resolution strategies
- Characterize when Merge can be automatic
- A merge algorithm for an EER representation
- Experimental evaluation
|
|
99
|
|