================================================================ CSE 344 -- Spring 2011 Lecture 18: Views ================================================================ A "view" = a "query" that defines a relation ******************************** * Customer(cid, name, city) * Purchase(cid, pid, store) * Product(pid, name, price) ******************************** CREATE VIEW StorePrice AS SELECT x.store, y.price FROM Purchase x, Product y WHERE x.pid = y.pid This is a like a new table: ******************************** * StorePrice(store,price) ******************************** A "high end" store is a store that sold some product over 1000. For each customer, find all the high end stores that they visit. Return a set of (customer-name, high-end-store) pairs. SELECT DISTINCT z.name, u.store FROM Customer z, Purchase u, StorePrice v WHERE z.cid = u.cid and u.store = v.store and v.price > 1000 ================================================================ Types of views: 1. Virtual views: the DBMS stores only the view definition and computed on demand 2. Materialized view: the DMBS computes the view when it is defined. *** In Class: what are the pros and cons ? When do we use one or the other ? ================================================================ 1. Virtual views: Query modification RDBMS rewrites the query above to: SELECT DISTINCT z.name, u.store FROM Customer z, Purchase u, (SELECT x.store, y.price FROM Purchase x, Product y WHERE x.pid = y.pid) v WHERE z.cid = u.cid and u.store = v.store and v.price > 1000 A good RDMBS further rewrites the query to: SELECT DISTINCT z.name, u.store FROM Customer z, Purchase u, Purchase x, Product y WHERE z.cid = u.cid and u.store = x.store and y.price > 1000 and x.pid = y.pid -------------- Further optimizations are often needed when using views. Create a view with all the information about purchases at 'ACME': CREATE VIEW AcmePurchase AS SELECT x.cid, x.name as cname, x.city, z.pid, z.name as pname, z.price FROM Customer x, Purchase y, Product z WHERE x.cid = y.cid and y.store = 'ACME' and y.pid = z.pid Now find the maximum price of all purchases at ACME: SELECT max(u.price) FROM AcmePurchase u The system rewrites it to: SELECT max(z.price) FROM Customer x, Purchase y, Product z WHERE x.cid = y.cid and y.store = 'ACME' and y.pid = z.pid *** In class: how can you further optimizer this query ? ================================================================ "Logical Data Independence" = application should not be affected by changes in the logical schema. Recall the schema from last lecture: Person(SSN, Name, Phone, City) SSN-->Name, City [but not Phone] We normalized the table to: P1(SSN, Name, City) P2(SSN, Phone) But the application already uses the Person table in SQL queries. How can we avoid having to rewrite the application code ? ================================================================ The view update problem: since users think of a view as a table, they may want to "update" the view, e.g. insert or delete a tuple. Is this possible ? Section 8.2.2. in the book. ================================================================ 2. Materialized views. Applications: -- Indexes: are materialized views ! -- Denormalization = join tables to improve performance -- Vertical Partition Product(pid, name, weight, size, height, ohms, amps, blahs, xxxs, yyys, ...) Vertically partitioned: P(pid,name), P1(pid, weight), P2(pid, size), ... There is a new trend in databases called "Column Store" that stores tables by columns instead of by rows. **** In class: what are the advantages/disadvantages of storing data by columns ? -- Horizontal partitioning: EmailAccount(email, name, country, other-info...) Horizontally partitioned: EmailUSA(email, name, other-info...) EmailChina(email, name, other-info...) EmailFrance(email, name, other-info...) ... Issues with materialized views: maintenance under updates.