Notes
Slide Show
Outline
1
Lecture 3:
Conceptual Database Design and Schema Design
  • April 12th, 2004
2
Agenda
  • Project: questions?
  • How to model data (E/R modeling)
  • How to design a good schema (normalization).
3
Building an Application with a DBMS
  • Requirements modeling (conceptual, pictures)
    • Decide what entities should be part of the application and how they should be linked.
  • Schema design and implementation
    • Decide on a set of tables, attributes.
    • Define the tables in the database system.
    • Populate database (insert tuples).
  • Write application programs using the DBMS
    • way easier now that the data management is taken care of.
4
Database Design
  • Why do we need it?
    •  Agree on structure of the database before deciding on a particular implementation.
  • Consider issues such as:
    • What entities to model
    • How entities are related
    • What constraints exist in the domain
    • How to achieve good designs
5
Database Design Formalisms
  • 1. Object Definition Language (ODL):
    •  Closer in spirit to object-oriented models
    •  I don’t teach it anymore.
  • 2. Entity/Relationship model (E/R):
    • More relational in nature.


  • Both can be translated (semi-automatically) to relational schemas
  • ODL to OO-schema: direct transformation (C++ or Smalltalk based system).
6
2. Entity / Relationship Diagrams
7
Keys in E/R Diagrams
  • Every entity set must have a key
8
 
9
What is a Relation ?
  • A mathematical definition:
    • if A, B are sets, then a relation R is a subset of A x B
  • A={1,2,3},   B={a,b,c,d},
    •    R = {(1,a), (1,c), (3,b)}


  • - makes is a subset of Product x Company:
10
Multiplicity of E/R Relations
  • one-one:


  • many-one


  • many-many
11
 
12
Multi-way Relationships
13
Arrows in Multiway Relationships
  • Q: what does the arrow mean ?







  • A: if I know the store, person, invoice, I know the movie too
14
Arrows in Multiway Relationships
  • Q: what do these arrow mean ?







  • A: store, person, invoice determines movie  and store, invoice, movie determines person
15
Arrows in Multiway Relationships
  • Q: how do I say: “invoice determines store” ?
  • A: no good way; best approximation:





  • Q: Why is this incomplete ?
16
Roles in Relationships
17
Attributes on Relationships
18
Converting Multi-way Relationships to Binary
19
From E/R Diagrams
to Relational Schema
  • Entity set à relation
  • Relationship à relation
20
Entity Set to Relation
21
Relationships to Relations
22
Relationships to Relations
23
Multi-way Relationships to Relations
24
3. Design Principles
25
Design Principles:
What’s Wrong?
26
Design Principles:
What’s Wrong?
27
Modeling Subclasses
28
 
29
Understanding Subclasses
  • Think in terms of records:
    • Product


    • SoftwareProduct


    • EducationalProduct
30
 
Subclasses to Relations
31
Modeling UnionTypes With Subclasses
  • Company
32
Modeling Union Types with Subclasses
  • Say: each piece of furniture is owned either by a person, or by a company
  • Solution 1. Acceptable, imperfect (What’s wrong ?)
  • Company
33
Modeling Union Types with Subclasses
  • Solution 2: better, more laborious
  • Company
34
Constraints in E/R Diagrams
35
 
Keys in E/R Diagrams
36
Single Value Constraints
37
Referential Integrity Constraints
38
Other Constraints
39
Weak Entity Sets
40
Handling Weak Entity Sets
41
The Relational Data Model
42
Relational Schema Design
43
Functional Dependencies
  • Definition:    A1, ..., Am à B1, ..., Bn holds in R if:


  • "t, t’ Î R, (t.A1=t’.A1 Ù ... Ù t.Am=t’.Am Þ t.B1=t’.B1 Ù ... Ù t.Bm=t’.Bm )


44
Important Point!
  • Functional dependencies are part of the schema!
  • They constrain the possible legal data instances.
  • At any point in time, the actual database may satisfy additional FD’s.
45
Examples




  • EmpID         Name, Phone, Position
  • Position        Phone
  • but  Phone         Position
46
Formal definition of a key
  • A key is a set of attributes A1, ..., An s.t. for any other attribute B, A1, ..., An à B


  • A minimal key is a set of attributes which is a key and for which no subset is a key


  • Note: book calls them superkey and key
47
Examples of Keys
  • Product(name, price, category, color)
    • name, category à price
    • category à color


    • Keys are:    {name, category} and all supersets


  • Enrollment(student, address, course, room, time)
    • student à address
    • room, time à course
    • student, course à room, time


    • Keys are:  [in class]
48
Finding the Keys of a Relation
49
Finding the Keys
50
Finding the Keys
51
Expressing Dependencies
52
Finding the Keys
  • More rules in the book – please read !
53
Relational Schema Design
(or Logical Design)
  • Main idea:
  • Start with some relational schema
  • Find out its FD’s
    • Important also to look at inferred FD’s.
  • Use them to design a better relational schema
54
Inference Rules for FD’s
55
Inference Rules for FD’s
(continued)
56
Inference Rules for FD’s
(continued)
57
 
58
"Enrollment(student,"
  • Enrollment(student, major, course, room, time)
    • student à major
    • major, course à room
    • course à time



    • What else can we infer ? [in class]
59
Closure of a set of Attributes
60
Closure Algorithm
61
Example
62
Why Is the Algorithm Correct ?
  • Show the following by induction:
    • For every B in X:
      • A1, …, An       B
  • Initially X = {A1, …, An}  -- holds
  • Induction step: B1, …, Bm in X
    • Implies  A1, …, An      B1, …, Bm
    • We also have B1, …, Bm      C
    • By transitivity we have A1, …, An      C
  • This shows that the algorithm is sound; need to show it is complete
63
Relational Schema Design
(or Logical Design)
  • Main idea:
  • Start with some relational schema
  • Find out its FD’s
  • Use them to design a better relational schema
64
Relational Schema Design
65
Relation Decomposition
66
Relational Schema Design
67
Decompositions in General
68
Incorrect Decomposition
  • Sometimes it is incorrect:


69
Incorrect Decomposition
70
Normal Forms
  • First Normal Form = all attributes are atomic


  • Second Normal Form (2NF) = old and obsolete


  • Third Normal Form (3NF) = this lecture


  • Boyce Codd Normal Form (BCNF) = this lecture


  • Others...
71
Boyce-Codd Normal Form
72
Example
73
Decompose it into BCNF
74
Summary of BCNF Decomposition
75
Example Decomposition
76
Other Example
  • R(A,B,C,D)    A      B,    B      C


  • Key:   A, D
  • Violations of BCNF:  A   B,  A   C,  A   BC
  • Pick  A    BC:  split into  R1(A,BC)   R2(A,D)
  • What happens if we pick A   B  first ?
77
Correct Decompositions
  •  A decomposition is lossless if we can recover:
  •                  R(A,B,C)



  •              R1(A,B)      R2(A,C)



  •                    R’(A,B,C)   should be the same as
                                        R(A,B,C)
78
Correct Decompositions
  • Given R(A,B,C) s.t. AàB, the decomposition into R1(A,B), R2(A,C) is lossless


79
3NF: A Problem with BCNF
80
So What’s the Problem?
81
Solution: 3rd Normal Form (3NF)