Notes
Slide Show
Outline
1
Introduction to Database Systems
CSEP 544
  • Lecture #1
  • March 29, 2004
  • Alon Halevy
2
Staff
  • Instructor:  Alon Halevy
    • Allen Center, Room 576, alon@cs.washington.edu
    • Office hours: Just before class (or by email)



  • TA: Stefan (Stebbi) Sigurdsson
    • stebbi@cs.washington.edu
    • Office hours: TBA
3
Communications
  • Web page:


  • http://www.cs.washington.edu/education/courses/csep544/04sp/


  • Mailing list: follow the directions at
    http://mailman.cs.washington.edu/csenetid/auth/mailman/listinfo/csep544
4
Textbook(s)
  • Main textbook, available at the bookstore:
  • Database Systems: The Complete Book, Hector Garcia-Molina, Jeffrey Ullman,  Jennifer Widom


  • Almost identical, and also available at the bookstore:
  • A First Course in Database Systems, Jeff Ullman and  Jennifer Widom
  • Database Implementation, Hector Garcia-Molina, Jeff Ullman and Jennifer Widom
  • Comments on the textbook
5
Other Texts
  • Database Management Systems, Ramakrishnan
    • very comprehensive
  • Fundamentals of Database Systems, Elmasri, Navathe
    • very widely used
  • Foundations of Databases, Abiteboul, Hull, Vianu
    • Mostly theory of databases
  • Data on the Web, Abiteboul, Buneman, Suciu
    • XML and other new/advanced stuff

6
Other Required Readings
  • There will be reading assignments from the Web:
  • SQL for Web Nerds, by Philip Greenspun, http://philip.greenspun.com/sql/
  • Others, especially for XML


  • For SQL, a good source of information is the MSDN library (on your Windows machine)
7
Course Structure
  • Prerequisites: Data structures course
  • Work & Grading:
    • Homework 30%: 3 of them, some light programming.
    • Project: 35% - coming up next.
    • Final:  35% (Discuss date)
8
The Project
  • Important component of the course.
  • 2 Phases.
  • I’ll tell you about phase 2 later.
  • Phase 1:
    • You build a database application on your own.
    • The domain of the application is inventory of some sort.
    • The application will have a simple web interface.
    • Done by the end of week 4.
9
Today
  • Motivation: why do we want databases.
  • Overview of database systems
    • Reading assignment from SQL for Web Nerds, by Philip Greenspun, Introduction
      http://philip.greenspun.com/sql/
  • Course Outline.
  • Basic elements of SQL
10
What Is a Relational Database Management System ?
  • Database Management System = DBMS
  • Relational DBMS = RDBMS


  • A program that makes it easy for you to manipulate large amounts of data.
  • Frees you from thinking about details. Enables you to focus on your challenges.


11
Where are RDBMS used ?
  • Backend for traditional “database” applications
    • Students and courses at a university
    • Bank accounting
    • Airline reservations
    • Movie listings
  • Backend for large Websites
  • Backend for Web services


12
Example of a Traditional Database Application
  • Suppose we are building a system
  • to store the information about:
  • students
  • courses
  • professors
  • who takes what, who teaches what
13
Data Management
  • Data management is more than databases.
  • Imagine:
    • Complete Traffic Information Availability
    • MyNeededBits Anytime, Anywhere
    • <your favorite visionary application here>
  • The techniques we learn are the principles of managing data anywhere.



14
Can we do it without a DBMS ?
  • Sure we can!  Start by storing the data in files:


  • students.txt      courses.txt          professors.txt




  • Now write C or Java programs to implement specific tasks
15
Doing it without a DBMS...
  • Enroll “Mary Johnson” in “CSE444”:


16
Problems without a DBMS...
  • System crashes:



    • What is the problem ?
  • Large data sets (say 50GB)
    • What is the problem ?
  • Simultaneous access by many users
    • Need locks:  we know them from OS, but now data on disk; and is there any fun to re-implement them ?
17
Enters a DMBS
18
Functionality of a DBMS
  • The programmer sees SQL, which has two components:
  • Data Definition Language - DDL
  • Data Manipulation Language - DML
    • query language

  • Behind the scenes the DBMS has:
  • Query optimizer
  • Query engine
  • Storage management
  • Transaction Management (concurrency, recovery)
19
How the Programmer Sees the DBMS
  • Start with DDL to create tables:




  • Continue with DML to populate tables:
20
How the Programmer Sees the DBMS
  • Tables:






  • Still implemented as files, but behind the scenes can be quite complex
21
Building an Application with a DBMS
  • Requirements modeling (conceptual, pictures)
    • Decide what entities should be part of the application and how they should be linked.
  • Schema design and implementation
    • Decide on a set of tables, attributes.
    • Define the tables in the database system.
    • Populate database (insert tuples).
  • Write application programs using the DBMS
    • way easier now that the data management is taken care of.
22
Transactions
  • Enroll “Mary Johnson” in “CSE444”:


23
Transactions
  • A transaction = sequence of statements that either all succeed, or all fail
  • Transactions have the ACID properties:
    • A = atomicity
    • C = consistency
    • I = independence
    • D = durability
24
Queries
  • Find all courses that “Mary” takes





  • What happens behind the scene ?
    • Query processor figures out how to answer the query efficiently.
25
Queries, behind the scene
26
Database Systems
  • The big commercial database vendors:
    • Oracle
    • IBM (with DB2)   bought Informix recently
    • Microsoft (SQL Server)
    • Sybase
  • Some free database systems (Unix) :
    • Postgres
    • Mysql
    • Predator
  • In CSEP544 we use SQL Server. You may use something else, but you are on your own.
27
New Trends in Databases
  • Object-relational databases
  • Main memory database systems
  • XML XML XML !
    • Relational databases with XML support
    • Middleware between XML and relational databases
    • Native XML database systems
    • Lots of research here at UW on XML and databases
  • Data integration
  • Peer to peer, stream data management – still research
28
The Study of DBMS
  • Several aspects:
    • Modeling and design of databases
    • Database programming: querying and update operations
    • Database implementation
  • DBMS study cuts across many fields of Computer Science: OS, languages, AI, Logic, multimedia, theory...
29
Course Outline
(may vary slightly)
  • Part I
  • SQL (Chapter 7) and its advanced features.
  • Database design (Chapters 2, 3, 7)
  • XML, XPath, XQuery
  • Data storage, indexes (Chapters 11-13)
  • Query execution and optimization (Chapter 15,16)
  • Data integration, meta-data management
30
The Relational Model (Codd)
31
SQL Introduction
32
SQL
  • Data Definition Language (DDL)
    • Create/alter/delete tables and their attributes
    • Following lectures...
  • Data Manipulation Language (DML)
    • Query one or more tables – discussed next !
    • Insert/delete/modify tuples in tables
  • Transact-SQL
    • Idea: package a sequence of SQL statements ŕ server
    • Won’t discuss in class
33
Data in SQL
  • Atomic types, a.k.a. data types
  • Tables built from atomic types


  • Unlike XML, no nested tables, only flat tables are allowed!
    • We will see later how to decompose complex structures into multiple flat tables

34
Data Types in SQL
  • Characters:
    • CHAR(20) -- fixed length
    • VARCHAR(40) -- variable length
  • Numbers:
    • BIGINT, INT, SMALLINT, TINYINT
    • REAL, FLOAT  -- differ in precision
    • MONEY
  • Times and dates:
    • DATE
    • DATETIME -- SQL Server
  • Others...  All are simple
35
Tables Explained
  • A tuple = a record
    • Restriction: all attributes are of atomic type
  • A table = a set of tuples
    • Like a list…
    • …but it is unordered: no first(), no next(), no last().

36
Tables Explained
  • The schema of a table is the table name and its attributes:
  • Product(PName, Price, Category, Manfacturer)


  • A key is an attribute whose values are unique;
    we underline a key


  • Product(PName, Price, Category, Manfacturer)
37
SQL Query
38
Simple SQL Query
39
Simple SQL Query
40
A Notation for SQL Queries
41
Selections
  • What goes in the WHERE clause:
  •  x = y, x < y, x <= y, etc
    • For numbers, they have the usual meanings
    • For CHAR and VARCHAR: lexicographic ordering
      • Expected conversion between CHAR and VARCHAR
    • For dates and times, what you expect...
  • Pattern matching on strings...
42
The LIKE operator
  • s LIKE p:  pattern matching on strings
  • p may contain two special symbols:
    • %  = any sequence of characters
    • _   = any single character

  • Product(PName, Price, Category, Manufacturer)
  • Find all products whose name mentions ‘gizmo’:



43
Eliminating Duplicates
44
Ordering the Results
45
Ordering the Results
46
Ordering the Results
47
Joins in SQL
  • Connect two or more tables:
48
Joins
49
Joins in SQL
50
Joins
51
Joins
52
When are two tables related?
  • You guess they are
  • I tell you so
  • Foreign keys are a method for schema designers to tell you so (7.1)
    • A foreign key states that a column is a reference to the key of another table
      ex: Product.manufacturer is foreign key of Company
    • Gives information and enforces constraint


53
Disambiguating Attributes
  • Sometimes two relations have the same attr:
    Person(pname, address, worksfor)
    Company(cname, address)


54
Tuple Variables
55
Tuple Variables
56
Meaning (Semantics) of SQL Queries
  • SELECT a1, a2, …, ak
  • FROM    R1 AS x1, R2 AS x2, …, Rn AS xn
  • WHERE  Conditions


  • 1. Nested loops:
57
First Unintuitive SQLism
58
Exercises
59
Union, Intersection, Difference
60
Conserving Duplicates
61
Subqueries
  • A subquery producing a single value:







  • In this case, the subquery returns one value.


  • If it returns more, it’s a run-time error.
62
"Can say the same thing..."
  • Can say the same thing without a subquery:






  • This is equivalent to the previous one when the ssn is a key
    and ‘123456789’ exists in the database;
    otherwise they are different.
63
Subqueries Returning Relations
64
Subqueries Returning Relations
65
Removing Duplicates
66
Removing Duplicates
67
Subqueries Returning Relations
68
Question for Database Fans
and their Friends
  • Can we express this query as a single SELECT-FROM-WHERE query, without subqueries ?



  • Hint:  show that all SFW queries are monotone (figure out what this means).  A query with ALL is not monotone
69
Conditions on Tuples
70
Correlated Queries
71
Complex Correlated Query
  • Product ( pname,  price, category, maker, year)
  • Find products (and their manufacturers) that are more expensive than all products made by the same manufacturer before 1972








  • Powerful, but much harder to optimize !
72
Aggregation
73
Aggregation: Count
74
Aggregation: Count
75
Simple Aggregation
76
Simple Aggregations
77
Grouping and Aggregation
78
Grouping and Aggregation
79
First compute the FROM-WHERE clauses (date > “10/1”) then GROUP BY product:
80
Then, aggregate
81
GROUP BY v.s. Nested Queries
82
Another Example
83
HAVING Clause
84
General form of Grouping and Aggregation
  • SELECT    S
  • FROM       R1,…,Rn
  • WHERE    C1
  • GROUP BY a1,…,ak
  • HAVING     C2


  • S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES
  • C1 = is any condition on the attributes in R1,…,Rn
  • C2 = is any condition on aggregate expressions
85
General form of Grouping and Aggregation
  • SELECT    S
  • FROM       R1,…,Rn
  • WHERE    C1
  • GROUP BY a1,…,ak
  • HAVING     C2


  • Evaluation steps:
  • Compute the FROM-WHERE part, obtain a table with all attributes in R1,…,Rn
  • Group by the attributes a1,…,ak
  • Compute the aggregates in C2 and keep only groups satisfying C2
  • Compute aggregates in S and return the result
86
Aggregation
  • Author(login,name)
  • Document(url, title)
  • Wrote(login,url)
  • Mentions(url,word)
87
"Find all authors who wrote..."
  • Find all authors who wrote at least 10 documents:
  • Attempt 1: with nested queries
88
"Find all authors who wrote..."
  • Find all authors who wrote at least 10 documents:
  • Attempt 2: SQL style (with GROUP BY)
89
"Find all authors who have..."
  • Find all authors who have a vocabulary over 10000 words:
90
Exercises