|
1
|
- Lecture #1
- March 29, 2004
- Alon Halevy
|
|
2
|
- Instructor: Alon Halevy
- Allen Center, Room 576, alon@cs.washington.edu
- Office hours: Just before class (or by email)
- TA: Stefan (Stebbi) Sigurdsson
- stebbi@cs.washington.edu
- Office hours: TBA
|
|
3
|
- Web page:
- http://www.cs.washington.edu/education/courses/csep544/04sp/
- Mailing list: follow the directions at
http://mailman.cs.washington.edu/csenetid/auth/mailman/listinfo/csep544
|
|
4
|
- Main textbook, available at the bookstore:
- Database Systems: The Complete Book, Hector Garcia-Molina, Jeffrey
Ullman, Jennifer Widom
- Almost identical, and also available at the bookstore:
- A First Course in Database Systems, Jeff Ullman and Jennifer Widom
- Database Implementation, Hector Garcia-Molina, Jeff Ullman and Jennifer
Widom
- Comments on the textbook
|
|
5
|
- Database Management Systems, Ramakrishnan
- Fundamentals of Database Systems, Elmasri, Navathe
- Foundations of Databases, Abiteboul, Hull, Vianu
- Mostly theory of databases
- Data on the Web, Abiteboul, Buneman, Suciu
- XML and other new/advanced stuff
|
|
6
|
- There will be reading assignments from the Web:
- SQL for Web Nerds, by Philip Greenspun, http://philip.greenspun.com/sql/
- Others, especially for XML
- For SQL, a good source of information is the MSDN library (on your
Windows machine)
|
|
7
|
- Prerequisites: Data structures course
- Work & Grading:
- Homework 30%: 3 of them, some light programming.
- Project: 35% - coming up next.
- Final: 35% (Discuss date)
|
|
8
|
- Important component of the course.
- 2 Phases.
- I’ll tell you about phase 2 later.
- Phase 1:
- You build a database application on your own.
- The domain of the application is inventory of some sort.
- The application will have a simple web interface.
- Done by the end of week 4.
|
|
9
|
- Motivation: why do we want databases.
- Overview of database systems
- Reading assignment from SQL for Web Nerds, by Philip Greenspun,
Introduction
http://philip.greenspun.com/sql/
- Course Outline.
- Basic elements of SQL
|
|
10
|
- Database Management System = DBMS
- Relational DBMS = RDBMS
- A program that makes it easy for you to manipulate large amounts of
data.
- Frees you from thinking about details. Enables you to focus on your challenges.
|
|
11
|
- Backend for traditional “database” applications
- Students and courses at a university
- Bank accounting
- Airline reservations
- Movie listings
- Backend for large Websites
- Backend for Web services
|
|
12
|
- Suppose we are building a system
- to store the information about:
- students
- courses
- professors
- who takes what, who teaches what
|
|
13
|
- Data management is more than databases.
- Imagine:
- Complete Traffic Information Availability
- MyNeededBits Anytime, Anywhere
- <your favorite visionary application here>
- The techniques we learn are the principles of managing data anywhere.
|
|
14
|
- Sure we can! Start by storing the
data in files:
- students.txt courses.txt professors.txt
- Now write C or Java programs to implement specific tasks
|
|
15
|
- Enroll “Mary Johnson” in “CSE444”:
|
|
16
|
- System crashes:
- Large data sets (say 50GB)
- Simultaneous access by many users
- Need locks: we know them from
OS, but now data on disk; and is there any fun to re-implement them ?
|
|
17
|
|
|
18
|
- The programmer sees SQL, which has two components:
- Data Definition Language - DDL
- Data Manipulation Language - DML
- Behind the scenes the DBMS has:
- Query optimizer
- Query engine
- Storage management
- Transaction Management (concurrency, recovery)
|
|
19
|
- Start with DDL to create tables:
- Continue with DML to populate tables:
|
|
20
|
- Tables:
- Still implemented as files, but behind the scenes can be quite complex
|
|
21
|
- Requirements modeling (conceptual, pictures)
- Decide what entities should be part of the application and how they
should be linked.
- Schema design and implementation
- Decide on a set of tables, attributes.
- Define the tables in the database system.
- Populate database (insert tuples).
- Write application programs using the DBMS
- way easier now that the data management is taken care of.
|
|
22
|
- Enroll “Mary Johnson” in “CSE444”:
|
|
23
|
- A transaction = sequence of statements that either all succeed, or all
fail
- Transactions have the ACID properties:
- A = atomicity
- C = consistency
- I = independence
- D = durability
|
|
24
|
- Find all courses that “Mary” takes
- What happens behind the scene ?
- Query processor figures out how to answer the query efficiently.
|
|
25
|
|
|
26
|
- The big commercial database vendors:
- Oracle
- IBM (with DB2) bought Informix
recently
- Microsoft (SQL Server)
- Sybase
- Some free database systems (Unix) :
- In CSEP544 we use SQL Server. You may use something else, but you are on
your own.
|
|
27
|
- Object-relational databases
- Main memory database systems
- XML XML XML !
- Relational databases with XML support
- Middleware between XML and relational databases
- Native XML database systems
- Lots of research here at UW on XML and databases
- Data integration
- Peer to peer, stream data management – still research
|
|
28
|
- Several aspects:
- Modeling and design of databases
- Database programming: querying and update operations
- Database implementation
- DBMS study cuts across many fields of Computer Science: OS, languages,
AI, Logic, multimedia, theory...
|
|
29
|
- Part I
- SQL (Chapter 7) and its advanced features.
- Database design (Chapters 2, 3, 7)
- XML, XPath, XQuery
- Data storage, indexes (Chapters 11-13)
- Query execution and optimization (Chapter 15,16)
- Data integration, meta-data management
|
|
30
|
|
|
31
|
|
|
32
|
- Data Definition Language (DDL)
- Create/alter/delete tables and their attributes
- Following lectures...
- Data Manipulation Language (DML)
- Query one or more tables – discussed next !
- Insert/delete/modify tuples in tables
- Transact-SQL
- Idea: package a sequence of SQL statements ŕ server
- Won’t discuss in class
|
|
33
|
- Atomic types, a.k.a. data types
- Tables built from atomic types
- Unlike XML, no nested tables, only flat tables are allowed!
- We will see later how to decompose complex structures into multiple
flat tables
|
|
34
|
- Characters:
- CHAR(20) -- fixed length
- VARCHAR(40) -- variable length
- Numbers:
- BIGINT, INT, SMALLINT, TINYINT
- REAL, FLOAT -- differ in
precision
- MONEY
- Times and dates:
- DATE
- DATETIME -- SQL Server
- Others... All are simple
|
|
35
|
- A tuple = a record
- Restriction: all attributes are of atomic type
- A table = a set of tuples
- Like a list…
- …but it is unordered: no first(), no next(), no last().
|
|
36
|
- The schema of a table is the table name and its attributes:
- Product(PName, Price, Category, Manfacturer)
- A key is an attribute whose values are unique;
we underline a key
- Product(PName, Price, Category, Manfacturer)
|
|
37
|
|
|
38
|
|
|
39
|
|
|
40
|
|
|
41
|
- What goes in the WHERE clause:
- x = y, x < y, x <= y, etc
- For numbers, they have the usual meanings
- For CHAR and VARCHAR: lexicographic ordering
- Expected conversion between CHAR and VARCHAR
- For dates and times, what you expect...
- Pattern matching on strings...
|
|
42
|
- s LIKE p: pattern matching on
strings
- p may contain two special symbols:
- % = any sequence of characters
- _ = any single character
- Product(PName, Price, Category, Manufacturer)
- Find all products whose name mentions ‘gizmo’:
|
|
43
|
|
|
44
|
|
|
45
|
|
|
46
|
|
|
47
|
- Connect two or more tables:
|
|
48
|
|
|
49
|
|
|
50
|
|
|
51
|
|
|
52
|
- You guess they are
- I tell you so
- Foreign keys are a method for schema designers to tell you so (7.1)
- A foreign key states that a column is a reference to the key of another
table
ex: Product.manufacturer is foreign key of Company
- Gives information and enforces constraint
|
|
53
|
- Sometimes two relations have the same attr:
Person(pname, address, worksfor)
Company(cname, address)
|
|
54
|
|
|
55
|
|
|
56
|
- SELECT a1, a2, …, ak
- FROM R1 AS x1, R2 AS x2, …, Rn
AS xn
- WHERE Conditions
- 1. Nested loops:
|
|
57
|
|
|
58
|
|
|
59
|
|
|
60
|
|
|
61
|
- A subquery producing a single value:
- In this case, the subquery returns one value.
- If it returns more, it’s a run-time error.
|
|
62
|
- Can say the same thing without a subquery:
- This is equivalent to the previous one when the ssn is a key
and ‘123456789’ exists in the database;
otherwise they are different.
|
|
63
|
|
|
64
|
|
|
65
|
|
|
66
|
|
|
67
|
|
|
68
|
- Can we express this query as a single SELECT-FROM-WHERE query, without
subqueries ?
- Hint: show that all SFW queries
are monotone (figure out what this means). A query with ALL is not monotone
|
|
69
|
|
|
70
|
|
|
71
|
- Product ( pname, price, category,
maker, year)
- Find products (and their manufacturers) that are more expensive than all
products made by the same manufacturer before 1972
- Powerful, but much harder to optimize !
|
|
72
|
|
|
73
|
|
|
74
|
|
|
75
|
|
|
76
|
|
|
77
|
|
|
78
|
|
|
79
|
|
|
80
|
|
|
81
|
|
|
82
|
|
|
83
|
|
|
84
|
- SELECT S
- FROM R1,…,Rn
- WHERE C1
- GROUP BY a1,…,ak
- HAVING C2
- S = may contain attributes a1,…,ak and/or any
aggregates but NO OTHER ATTRIBUTES
- C1 = is any condition on the attributes in R1,…,Rn
- C2 = is any condition on aggregate expressions
|
|
85
|
- SELECT S
- FROM R1,…,Rn
- WHERE C1
- GROUP BY a1,…,ak
- HAVING C2
- Evaluation steps:
- Compute the FROM-WHERE part, obtain a table with all attributes in R1,…,Rn
- Group by the attributes a1,…,ak
- Compute the aggregates in C2 and keep only groups satisfying C2
- Compute aggregates in S and return the result
|
|
86
|
- Author(login,name)
- Document(url, title)
- Wrote(login,url)
- Mentions(url,word)
|
|
87
|
- Find all authors who wrote at least 10 documents:
- Attempt 1: with nested queries
|
|
88
|
- Find all authors who wrote at least 10 documents:
- Attempt 2: SQL style (with GROUP BY)
|
|
89
|
- Find all authors who have a vocabulary over 10000 words:
|
|
90
|
|