ESTIMATED TIME: 12 hours.
TURN IN INSTRUCTIONS: Please turn in tune.sql, question2.txt, trigger.sql, and access-control.sql . Use the regular assignment drop box.
Some basic DBMS administration:
Note that this assignment covers topics complementary to what we discussed in class and in sections. Therefore, a large fraction of time for this assignment should be spent reading documentation (we include all necessary links below) and figuring out how to perform the various administrative tasks. We took this time into account when preparing the assignment. Learning how to learn more about DBMSs is an important skill to acquire. We also want to make sure that you know there is a lot more to DBMSs than we have time to cover in CSE 444.
POSTGRES, JDBC DRIVERS, CLASSPATH: You need the same setup as in project 2. See instructions there.
STARTER CODE: Please download project3.tar.gz. This archive contains:
CREATE YOUR DATABASE
RUN THE STARTER CODE
javac DatabaseGenerator.java javac TestQueries.java java DatabaseGenerator 100 psql -f importDatabase.sql project3 java TestQueries 1 Note 1: Be aware that the last two lines will take 2-3 minutes each on the lab machines. Note 2: You can also run the script from within psql with the following command: \i importDatabase.sql You may need to specify the absolute path to the script and also modify importDatabase.sql to add absolute pathnames to all filenames.
If you get any error up to here, contact one of the TAs.
You are the database administrator for the UW Medical Center (UWMC). In this project, the UWMC database is generated by the Java program DatabaseGenerator.java, and imported by the script importDatabase.sql.
The database has the following schema:
Patient(pid INTEGER, fname VARCHAR(20), lname VARCHAR(20), age INT, street VARCHAR(20), city VARCHAR(10), zipcode VARCHAR(5)) Disease(pid INTEGER, disease VARCHAR(20)) Doctor(did INTEGER, fname VARCHAR(20), lname VARCHAR(20), specialty VARCHAR(20)) Sees(pid INTEGER,did INTEGER) Product(eid INTEGER, description VARCHAR(20)) Stock(eid INTEGER, quantity INTEGER) Supplier(sid INTEGER, name VARCHAR(20), street VARCHAR(20), city VARCHAR(10), zipcode VARCHAR(5)) Supplies(eid INTEGER,sid INTEGER)The Patient, and Doctor tables contain information about the patients and doctors, respectively. The Disease table stores which patient has which disease(s). The Sees table says which patient is seeing which doctor(s). n.b. For the Patient table, it is usually better to use date-of-birth instead of age since the d.o.b. does not change every year! However, for the purposes of this problem, it is easier to use age.
The Product table contains information about the medical supplies and equipment that the UWMC has. The Stock table maintains the remaining quantity of each product. The Supplier table contains information about the supplier companies, whereas the Supplies table says which products are supplied by which supplier companies.
For each relation, the attibutes that form the primary key are underlined.
The staff at UWMC access the database extensively by running TestQueries.java. In TestQueries.java, you'll see that there are six kinds of queries that they run very often (e.g. finding the list of doctors a specific patient has seen, finding the number of patients with a specific disease). You are in charge of administrating the database and optimizing it for performance. As you saw, TestQueries.java takes a couple minutes to run: you need to improve that.
Create a file tune.sql where you put some star dust to sprinkle over the database that makes TestQueries run faster. After your star dust has settled, TestQueries will run much faster and your customers will love you. The UWMC is paying you a lot of money, so they would like the TestQueries to run at least 20 times faster. Although they currently do not have any INSERT statements in their workload, the database should be able to support them without a significant cost. Therefore, you are allowed to use no more than 7-8 indexes (in addition to the default ones created by postgres). Also, for each index, explain in 1-2 sentences (as a comment in tune.sql) why you use that particular index (as opposed to another one - if applicable). If you want a challenge: on the lab machines, there is a solution that improves things by 30x using only 6 indexes. Can you beat it?
In this assignment, you will only tune the performance by adding indexes. You should create indexes on one or more attributes. You may create several indexes per table, and may decide which one to make a primary index. Be aware that, by default, Postgres builds a primary index on the primary key of each table. You should focus ONLY on optimizing the workload TestQueries.java. Some other queries may become slower as a result of your optimization, but that's OK as long as TestQueries runs faster.
Note that in practice, physical database tuning is more involved and includes table partitioning, materialized views, and other.
To create an index, use the following command:
CREATE INDEX index_name ON table_name(column_name1, column_name2, ..., column_namek);
To make an index be the primary index, use the following command:
CLUSTER index_name ON table_name;Hint: you may want to check what plan Postgres picks for a given query, and for that you may use
explain select count(*) from Supplier where city='Kirkland';
TESTING AND RUNNING YOUR SOLUTION For this question, you need to create a file tune.sql that you will turn-in. After you create the file or update it, to test your solution do the following:
createdb project3 psql -f importDatabase.sql project3 psql -f tune.sql project3 java TestQueries 1
If you need to start from scratch (i.e. remove your indexes and get back postgres's default indexes), the dropdb command may also be helpful. You can use it as:
Note: This question (and the next one too) is difficult because you will learn something that we did not discuss in class nor in sections. The instructions below point you to all the documentation that you need. If you have further questions, please do not hesitate to post general questions on the message board or send us email. This should be a worthwhile learning experience.
Motivation: It is often useful to have the DBMS perform some actions automatically in response to operations on the database. The UWMC database already does that to some extent. For example, if you delete a patient from the database, the corresponding entry in the Sees relation will also be deleted automatically
Examine the CREATE TABLE statements in importDatabase.sql. What actions will the DBMS perform automatically to maintain referential integrity in face of insert/delete/update statements performed on the various tables? Please write your answer in a file called question2.txt and please remember to turn in that file (you may want to review Lecture 4).
Sometimes, we would like the DBMS to perform more complex actions in response to more complex events. For example, if a product is no longer being used, a nurse will remove it from the Supplies table. We would like, however, to keep this product in the database until its stock goes to zero. Once the product is out of stock, it should be removed from the Product and Stock tables. We can make the DBMS perform the latter two deletes automatically by defining a trigger on the database. You can read more about triggers in Section 7.5 of the book [Old edition: Section 7.4].
Add a trigger to the database that performs the following function: if the quantity of a product in stock goes down to zero and there is no supplier for the product, remove the product from the Product and Stock tables.
You do not need to worry about the cases when:
Before you start:
The sample trigger is very similar to the one that you need to write. This sample trigger prints an error message when someone updates the stock for an item that no longer appears in the Supplies table. As a comment inside trigger-sample.sql, you will find the detailed instructions for loading and running this sample trigger. To write your trigger, you should only need to add about three lines of code to this example and you can remove the error message if you'd like.
Here are the steps you need to take to complete this task:
CREATE LANGUAGE plpgsql
This question is also a good opportunity to learn about PL/pgSQL and the concept of stored procedures (optional).
TESTING AND RUNNING YOUR SOLUTION: Try to perform actions on the database and verify that the trigger fires at the appropriate times and performs the appropriate actions.
Hint: For several of the links we provide for this problem, you can scroll down to the bottom of the linked page to see a few examples - which can be helpful to see the syntax of the statement.
There is nothing about access control and security in our book, but you can find more information here:
The hospital enforces a strict access control policy to its database. It distinguishes four kinds of users:
The hospital also classifies the data into three overlapping categories:
The access control policy is the following. The public is the least privileged: doctors, nurses, and admin are allowed to access everything that the public can access.
Your task is to write the SQL access policy statements that enforce this policy. Please procede as follows.
Please add all your commands to a file called access-control.sql and hand-in this file.
3.1.1 Creating roles and users
As a first step, create one database role for each of the three kinds of users: doctors, nurses, and administrators. Call these roles "doctor", "nurse", and "uwmcadmin" respectively (make sure not to use capital letters, they seem to cause problems). A PUBLIC role already exists. To create roles, use the CREATE ROLE statement. You should not need to use any of the options available with this statement. Database roles are like groups: you will grant privileges to roles and will associate users with roles.
Now that you have the roles, create one test user for each role (doctor1, nurse1, admin1, and guest1) using the CREATE USER statement. Make sure that:
Note: guest1 need not be associated with any special roles.
In addition to the PostgreSQL documentation website, you can also use the postgres help command to get the correct syntax for these statements, e.g. \h CREATE ROLE.
TESTING AND RUNNING YOUR SOLUTION: Try to connect to the project3 database as each user. Verify that you can connect but that you cannot view any data nor perform any inserts/updates at this time.
3.1.2 Creating public views and granting access to these views
Create one or more views on the database. These views should expose only information classified as PUBLIC DATA. They should hide everything else.
Grant access this these views to the appropriate role(s). For this, use the GRANT STATEMENT. Consider the following question: do you need to grant access explicitly to all roles? Or is it enough to grant access to PUBLIC?
TESTING AND RUNNING YOUR SOLUTION: Verify that all your users can see the public data.
3.1.3 Creating other views and granting other privileges
Create other views and grant other privileges as appropriate to implement the UWMC access control policies. Note that you do not always have to create views. When appropriate, you can grant access rights directly to existing tables.
TESTING AND RUNNING YOUR SOLUTION: To test your solution, try to run psql as each of the four users and check what operations you are allowed to perform.
Because UWMC is affiliated with a university, it wants to support medical research. This is a difficult requirement. On one hand, UWMC wants to keep all patient data confidential. On the other hand, it wants to show the most detailed data possible to researchers.
To address this requirement, UWMC decides to add another role, called "researcher". A researcher should have read-only access to a view with the following schema:
DiseaseResearch(zipcode VARCHAR(5), age INTEGER, disease VARCHAR(20))
This table indicates the diseases for patients with different ages and at different locations. It does not include any personally identifiable information.
Please add the "researcher" role with the appropriate view and access control rights. Create a user researcher1 to test your solution. Remember to add all commands to the access-control.sql file.
As a smart administrator, however, you uncover the following problem: most patients at UWMC are also registered voters in the state of Washington and the voter's registry is publicly available. The voter's registry has the following schema
Voter(vid INTEGER, fname VARCHAR(20), lname VARCHAR(20), age INT, zipcode VARCHAR(5))
A voter's registry was generated along with the tables in the UWMC database. Please run the command:
psql -f importVoter.sql project3
to load it into your DBMS. (If you deleted the text files for the various database tables, run
java DatabaseGenerator 100 to get a new set of tables. Note that you will have to reimport both the UWMC database and Voter table after this.)
Grant public read access to this new table (this is a public database).
Show how a researcher can now uncover with high likelihood the disease(s) that certain people have. Include the SQL Statement(s) that you write into the access-control.sql file.
Please discuss your findings (is this problem significant? how could it be solved? etc.). Add your answer as a comment into the file access-control.sql