CSE 344 Lecture 02 -- Relational Data Model

================================================================

Terminology reminder:

   Data model:	general, conceptual way of structuring data
   Schema:	structure of a particular database under a certain data model
   Instance:	actual data conforming to a schema

Data models studied in this course:
	relational data model
	semistructured data model (XML) 

Other data models:
	key-value pairs
	graph data model (RDF)
	object-oriented


Note: a "good" data model describes both the data AND a query language

================================================================

The Relational data model:

(Example:  Actor(id, fname, lname, gender))


  Database instance:
  -- "table" or "relation"
  -- "column" or "attribute" or "field"
  -- "row" or "tuple" or "record"

  Database schema:
  -- "table name" or "relation name"
  -- "column name" or "attribute name"
  -- each attribute has a "type" or "domain"

  Types or datatypes

  -- Character strings: CHAR(20), VARCHAR(50), TEXT
  -- Numbers:INT, BIGINT, SMALLINT, FLOAT
  -- Others: MONEY, DATETIME, 

  -- Types are vendor specific
  -- Types are static and strictly enforced; 
        exception: sqlite has dynamic types
        http://www.sqlite.org/datatype3.html

  Keys:

  -- an attribute is called a "key" if it uniquely identifies a record
  -- other tuples may use key values as "logical pointers"


** We can have a key with multiple attributes: what does this mean ?


================================================================

Specifying the schema on paper:

  -- w/o types:	Actor(id, fname, lname, gender)
  -- w/ types:	Actor(id:int, fname: char(20), lname:char(30), gender:char(1))
  -- specify keys by underlining


Specifying the schema in  SQL:

(Example: create table actor .... primary key...)
(Example: insert into actor values ( ... ))


Comment: upper/lower case; name conflicts
   -- Actor, actor, ACTOR  = all the same
   -- Actor(name, gender), Movie(name, year) = name OK
   -- Movie(title, actor), Actor(name, gender) = actor NOT OK


A peek at the physical implementation:   

  -- row oriented
  -- column oriented
  -- vertically partitioned
  -- horizontally partitioned

** What are the pros/cons of the different physical implementations ?


Data model DOES NOT prescribe the physical implementation ! 
This is "data independence".

** Why not ?


Back to the relational data model:

  -- null values: useful when the data is missing, not applicable etc
     may create problems when we compare values
  -- keys and foreign keys

** Can a key be null ?


(Example on paper: 
 Movie(id, title, year), Casts(mid, pid, role), Actor(id,fname, lname, gender))


(Example in SQL:
  create tables with NOT NULL, PRIMARY KEY, REFERENCES)


================================================================
Discussion of the relational model:

  -- the relations are unordered
  -- all types are atomic: no lists/relations/trees/...

** Why ?


Altering a table  in SQL

  -- add/drop attribute(s)
  -- change the type of an attribute

(Example: alter table actor add column dob date;)


A peek at the physical implementation:

** What happens when you alter a table ?  Consider row-wise and
   column-wise.


================================================================

SQL

Simple SQL queries:

(Example select * from actor where lname = 'Foster'; show projection)


** What happens at the physical level ?


Indexes:  not part of the data model, but VERY IMPORTANT !

(Example: create index actor_lname_i on actor(lname))


Now the SAME SQL query above runs faster !  "Data independence"