|
1
|
|
|
2
|
- Nulls and outerjoins
- Creating and updating schemas
- Views: updating and reusing them
- Constraints
- Programming with SQL
- Relational algebra
|
|
3
|
- Explicit joins in SQL:
- Product(name, category)
- Purchase(prodName, store)
- Same as:
- But Products that never sold will be lost !
|
|
4
|
- Left outer joins in SQL:
- Product(name, category)
- Purchase(prodName, store)
|
|
5
|
|
|
6
|
- Left outer join:
- Include the left tuple even if there’s no match
- Right outer join:
- Include the right tuple even if there’s no match
- Full outer join:
- Include the both left and right tuples even if there’s no match
|
|
7
|
- Three kinds of modifications
- Insertions
- Deletions
- Updates
- Sometimes they are all called “updates”
|
|
8
|
|
|
9
|
|
|
10
|
|
|
11
|
|
|
12
|
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
- Characters:
- CHAR(20) -- fixed length
- VARCHAR(40) -- variable length
- Numbers:
- INT, REAL plus variations
- Times and dates:
- DATE, DATETIME (SQL Server only)
- To reuse domains:
CREATE DOMAIN address AS
VARCHAR(55)
|
|
17
|
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
- Create an index on name:
- B+ trees have fan-out of 100s: max 4 levels !
|
|
22
|
|
|
23
|
|
|
24
|
|
|
25
|
- Nulls and outerjoins
- Creating and updating schemas
- Views: updating and reusing them
- Constraints
- Programming with SQL
- Relational algebra
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
- SELECT name, Seattle-view.store
- FROM Seattle-view, Product
- WHERE Seattle-view.product =
Product.name AND
- Product.category
= “shoes”
|
|
30
|
- Virtual views:
- Used in databases
- Computed only on-demand – slower at runtime
- Always up to date
- Materialized views
- Used in data warehouses
- Precomputed offline – faster at runtime
- May have stale data
|
|
31
|
|
|
32
|
|
|
33
|
- What if we want to use a set of views to answer a query.
- Why?
- The obvious reason…
- Answering queries over web data sources.
- Very cool stuff! (i.e., I did a lot of research on this).
|
|
34
|
- Suppose I have only the result of SeattleView:
- SELECT buyer, seller, product, store
- FROM Person, Purchase
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer
- and I want to answer the query
- SELECT buyer, seller
- FROM Person, Purchase
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer AND
-
Purchase.product=‘gizmo’.
- Then, I can rewrite the query using the view.
|
|
35
|
- Rewritten query:
- SELECT buyer, seller
- FROM SeattleView
- WHERE product= ‘gizmo’
- Original query:
- SELECT buyer, seller
- FROM Person, Purchase
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer AND
-
Purchase.product=‘gizmo’.
|
|
36
|
- I still have only the result of SeattleView:
- SELECT buyer, seller, product, store
- FROM Person, Purchase
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer
- but I want to answer the query
- SELECT buyer, seller
- FROM Person, Purchase
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer AND
-
Person.Phone LIKE ‘206 543
%’.
|
|
37
|
- I still have only the result of SeattleView:
- SELECT buyer, seller, product, store
- FROM Person, Purchase, Product
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer AND
-
Purchase.product = Product.name
- but I want to answer the query
- SELECT buyer, seller
- FROM Person, Purchase
- WHERE Person.city = ‘Seattle’ AND
-
Person.per-name = Purchase.buyer.
|
|
38
|
- I still have only the result of:
- SELECT seller, buyer, Sum(Price)
- FROM Purchase
- WHERE Purchase.store = ‘The Bon’
- Group By seller, buyer
- but I want to answer the query
- SELECT seller, Sum(Price)
- FROM Purchase
- WHERE Person.store = ‘The Bon’
- Group By seller
- And what if it’s the other way around?
|
|
39
|
- I still have only the result of:
- SELECT seller, buyer, Count(*)
- FROM Purchase
- WHERE Purchase.store = ‘The Bon’
- Group By seller, buyer
- but I want to answer the query
- SELECT seller, Count(*)
- FROM Purchase
- WHERE Person.store = ‘The Bon’
- Group By seller
|
|
40
|
- Given a set of views V1,…,Vn, and a query Q, can we answer Q using only
the answers to V1,…,Vn?
- Why do we care?
- We can answer queries more efficiently.
- We can query data sources on the WWW in a principled manner.
- Many, many papers on this problem.
- The best performing algorithm: The MiniCon Algorithm, (Pottinger &
(Ha)Levy, 2000).
|
|
41
|
- Assume a virtual schema of the WWW, e.g.,
- Course(number, university, title, prof, quarter)
- Every data source on the web contains the answer to a view over the
virtual schema:
- UW database: SELECT number,
title, prof
- FROM Course
- WHERE univ=‘UW’ AND quarter=‘2/02’
- Stanford database: SELECT number,
title, prof, quarter
- FROM Course
- WHERE univ=‘Stanford’
- User query: find all professors who teach “database systems”
|
|
42
|
- Nulls and outer-joins
- Creating and updating schemas
- Views: updating and reusing them
- Constraints
- Programming with SQL
- Relational algebra
|
|
43
|
- A constraint = a property that we’d like our database to hold
- The system will enforce the constraint by taking some actions:
- forbid an update
- or perform compensating updates
|
|
44
|
- Constraints in SQL:
- Keys, foreign keys
- Attribute-level constraints
- Tuple-level constraints
- Global constraints: assertions
- The more complex the constraint, the harder it is to check and to
enforce
|
|
45
|
|
|
46
|
- CREATE TABLE Product (
- name CHAR(30),
- category VARCHAR(20),
- price INT,
- PRIMARY KEY (name, category))
|
|
47
|
- CREATE TABLE Product (
- productID CHAR(10),
- name CHAR(30),
- category VARCHAR(20),
- price INT,
PRIMARY KEY
(productID),
- UNIQUE (name, category))
|
|
48
|
- CREATE TABLE Purchase (
- prodName CHAR(30)
- REFERENCES Product(name),
- date DATETIME)
|
|
49
|
|
|
50
|
- OR
- (name, category) must be a PRIMARY KEY
|
|
51
|
- Types of updates:
- In Purchase: insert/update
- In Product: delete/update
|
|
52
|
- SQL has three policies for maintaining referential integrity:
- Reject violating modifications (default)
- Cascade: after a delete/update do a delete/update
- Set-null set foreign-key field to NULL
- READING ASSIGNEMNT: 7.1.5, 7.1.6
|
|
53
|
- Constraints on attributes:
NOT NULL -- obvious meaning...
CHECK condition -- any condition !
- Constraints on tuples
CHECK condition
|
|
54
|
|
|
55
|
|
|
56
|
- Can give them names, and alter later
- We need to understand exactly when they are checked
- We need to understand exactly what actions are taken if they fail
|
|
57
|
|
|
58
|
|
|
59
|
|
|
60
|
|
|
61
|
|
|
62
|
- Nulls and outerjoins
- Creating and updating schemas
- Views: updating and reusing them
- Constraints
- Programming with SQL
- Relational algebra
|
|
63
|
- direct SQL (= ad-hoc SQL) is rarely used
- in practice: SQL is embedded in some application code
- SQL code is identified by special syntax
|
|
64
|
- Example: SQL in C:
- C uses int, char[..], pointers, etc
- SQL uses tables
- Impedance mismatch = incompatible types
|
|
65
|
- Why not use only one language?
- Forgetting SQL: “we can quickly dispense with this idea” [textbook, pg. 351].
- SQL cannot do everything that the host language can do.
- Solution: use cursors
|
|
66
|
|
|
67
|
|
|
68
|
- Product (pname, price, quantity,
maker)
- Purchase (buyer, seller, store,
pname)
- Company (cname, city)
- Person(name, phone, city)
|
|
69
|
|
|
70
|
|
|
71
|
- Declare the cursor
- Open the cursor
- Fetch tuples one by one
- Close the cursor
|
|
72
|
- void product2XML() {
- EXEC SQL BEGIN DECLARE SECTION;
- char n[20], c[30];
- int p, q;
- char SQLSTATE[6];
- EXEC SQL END DECLARE SECTION;
- EXEC SQL DECLARE crs CURSOR FOR
- SELECT pname, price, quantity,
maker
- FROM Product;
- EXEC SQL OPEN crs;
|
|
73
|
- printf(“<allProducts>\n”);
- while (1) {
- EXEC SQL FETCH FROM crs INTO
:n, :p, :q, :c;
- if (NO_MORE_TUPLES) break;
- printf(“ <product>\n”);
- printf(“ <name> %s </name>\n”, n);
- printf(“ <price> %d </price>\n”, p);
- printf(“ <quantity> %d
</quantity>\n”, q);
- printf(“ <maker> %s </maker>\n”, c);
- printf(“ </product>\n”);
- }
- EXECT SQL CLOSE crs;
- printf(“</allProducts>\n”);
- }
|
|
74
|
- What is NO_MORE_TUPLES ?
- #define NO_MORE_TUPLES !(strcmp(SQLSTATE,”02000”))
|
|
75
|
|
|
76
|
- Nulls and outerjoins
- Creating and updating schemas
- Views: updating and reusing them
- Constraints
- Programming with SQL
- Relational algebra
|
|
77
|
- Formalism for creating new relations from existing ones
- Its place in the big picture:
|
|
78
|
- Five operators:
- Union: È
- Difference: -
- Selection: s
- Projection: P
- Cartesian Product: ´
- Derived or auxiliary operators:
- Intersection, complement
- Joins (natural,equi-join, theta join, semi-join)
- Renaming: r
|
|
79
|
- R1 È R2
- Example:
- ActiveEmployees È
RetiredEmployees
- R1 – R2
- Example:
- AllEmployees -- RetiredEmployees
|
|
80
|
- It is a derived operator
- R1 Ç R2 = R1 – (R1
– R2)
- Also expressed as a join (will see later)
- Example
- UnionizedEmployees Ç RetiredEmployees
|
|
81
|
- Returns all tuples which satisfy a condition
- Notation: sc(R)
- Examples
- sSalary > 40000 (Employee)
- sname = “Smithh” (Employee)
- The condition c can be =, <, £, >, ³, <>
|
|
82
|
|
|
83
|
- Eliminates columns, then removes duplicates
- Notation: P A1,…,An (R)
- Example: project social-security number and names:
- P SSN, Name
(Employee)
- Output schema: Answer(SSN,
Name)
|
|
84
|
|
|
85
|
- Each tuple in R1 with each tuple in R2
- Notation: R1 ´ R2
- Example:
- Very rare in practice; mainly used to express joins
|
|
86
|
|
|
87
|
- Five operators:
- Union: È
- Difference: -
- Selection: s
- Projection: P
- Cartesian Product: ´
- Derived or auxiliary operators:
- Intersection, complement
- Joins (natural,equi-join, theta join, semi-join)
- Renaming: r
|
|
88
|
- Changes the schema, not the instance
- Notation: r B1,…,Bn (R)
- Example:
- rLastName, SocSocNo
(Employee)
- Output schema:
Answer(LastName, SocSocNo)
|
|
89
|
|
|
90
|
- Notation: R1 ⋈ R2
- Meaning: R1 ⋈ R2 = PA(sC(R1
´ R2))
- Where:
- The selection sC checks equality of all common attributes
- The projection eliminates the duplicate common attributes
|
|
91
|
|
|
92
|
|
|
93
|
- Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈
S ?
- Given R(A, B, C), S(D, E), what
is R ⋈ S ?
- Given R(A, B), S(A, B), what is R ⋈ S ?
|
|
94
|
- A join that involves a predicate
- R1 ⋈ q R2
= s q (R1 ´ R2)
- Here q can be any condition
|
|
95
|
- A theta join where q is an equality
- R1 ⋈A=B R2
= s A=B (R1 ´ R2)
- Example:
- Employee ⋈SSN=SSN Dependents
- Most useful join in practice
|
|
96
|
- R ⋉ S = P A1,…,An
(R ⋈ S)
- Where A1, …, An are the attributes in R
- Example:
|
|
97
|
- Semijoins are used in distributed databases
|
|
98
|
- Person Purchase Person Product
|
|
99
|
- A bag = a set with repeated elements
- All operations need to be defined carefully on bags
- {a,b,b,c}È{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
- {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
- sC(R): preserve the
number of occurrences
- PA(R): no duplicate
elimination
- Cartesian product, join: no duplicate elimination
- Important ! Relational Engines work on bags, not sets !
|
|
100
|
- Cannot compute “transitive closure”
- Find all direct and indirect relatives of Fred
- Cannot express in RA !!! Need to
write C program
|