CSE 344 Lecture 02 -- Relational Data Model ================================================================ Terminology reminder: Data model: general, conceptual way of structuring data Schema: structure of a particular database under a certain data model Instance: actual data conforming to a schema Data models studied in this course: relational data model semistructured data model (XML) Other data models: key-value pairs graph data model (RDF) object-oriented Note: a "good" data model describes both the data AND a query language ================================================================ The Relational data model: (Example: Actor(id, fname, lname, gender)) Database instance: -- "table" or "relation" -- "column" or "attribute" or "field" -- "row" or "tuple" or "record" Database schema: -- "table name" or "relation name" -- "column name" or "attribute name" -- each attribute has a "type" or "domain" Types or datatypes -- Character strings: CHAR(20), VARCHAR(50), TEXT -- Numbers:INT, BIGINT, SMALLINT, FLOAT -- Others: MONEY, DATETIME, -- Types are vendor specific -- Types are static and strictly enforced; exception: sqlite has dynamic types http://www.sqlite.org/datatype3.html Keys: -- an attribute is called a "key" if it uniquely identifies a record -- other tuples may use key values as "logical pointers" ** We can have a key with multiple attributes: what does this mean ? ================================================================ Specifying the schema on paper: -- w/o types: Actor(id, fname, lname, gender) -- w/ types: Actor(id:int, fname: char(20), lname:char(30), gender:char(1)) -- specify keys by underlining Specifying the schema in SQL: (Example: create table actor .... primary key...) (Example: insert into actor values ( ... )) Comment: upper/lower case; name conflicts -- Actor, actor, ACTOR = all the same -- Actor(name, gender), Movie(name, year) = name OK -- Movie(title, actor), Actor(name, gender) = actor NOT OK A peek at the physical implementation: -- row oriented -- column oriented -- vertically partitioned -- horizontally partitioned ** What are the pros/cons of the different physical implementations ? Data model DOES NOT prescribe the physical implementation ! This is "data independence". ** Why not ? Back to the relational data model: -- null values: useful when the data is missing, not applicable etc may create problems when we compare values -- keys and foreign keys ** Can a key be null ? (Example on paper: Movie(id, title, year), Casts(mid, pid, role), Actor(id,fname, lname, gender)) (Example in SQL: create tables with NOT NULL, PRIMARY KEY, REFERENCES) ================================================================ Discussion of the relational model: -- the relations are unordered -- all types are atomic: no lists/relations/trees/... ** Why ? Altering a table in SQL -- add/drop attribute(s) -- change the type of an attribute (Example: alter table actor add column dob date;) A peek at the physical implementation: ** What happens when you alter a table ? Consider row-wise and column-wise. ================================================================ SQL Simple SQL queries: (Example select * from actor where lname = 'Foster'; show projection) ** What happens at the physical level ? Indexes: not part of the data model, but VERY IMPORTANT ! (Example: create index actor_lname_i on actor(lname)) Now the SAME SQL query above runs faster ! "Data independence"