Homework 4: Relational Query Optimization and DB creation
Due: August 5, 1998, at beginning of class. No late submissions accepted.
This homework has two parts. In part one, you will use SQL Server to create a relational database corresponding to the E/R diagram you drew for Homework 1. In part two, you will experiment with a tool that allows you to visualize the process of relational query optimization.
1. SQL SERVER Database: Create relations to model the data described in your E/R diagram from Homework 1. Use SQL Server. There are two items you will need now that you did not necessarily have in the E/R diagram: Domain information and functional dependencies. As much information as possible should be transferred from the E/R diagram to your SQL Server database. Where something cannot be transferred, you'll need to make note of that that and explain why.
NOTES: The goal here is NOT to have you become SQL Server experts, but rather to give you practice applying the logical and physical design techniques we've studied. Do not feel obligated to use every bell and whistle you can find in the DBMS and do not feel obligated to go out and buy a book on the DBMS. If there is something you're trying to do that's giving you fits, please ask me and we'll do something about it. You can also check out the online documentation (SQL Server Books Online, which is an option at the same level as the Enterprise Manager). You will be populating (entering tuples into) and querying the database as part of a later assignment, but if you want to populate it now, that's fine too.
Some guidelines you must follow:
What to hand in:
2. Query Processing and Optimization:
In this question, you will carry out a number of exercises involving the optimization of relational queries using an optimizer visualization tool. You will not have to write any code, and your answers should be turned in as a text file (or handwritten).
The chapter on optimization and the lecture transparencies should provide all the background information that you need on query optimization. The tool itself has an on-line help feature, and additional information is included here as part of exercise descriptions.
To use the Minibase query optimization viewer, you must have the same three lines in your .cshrc that you added for the previous homework assignment. If you did that, you're ready to go. The program to run is called minibaseview.
These exercises are based on two catalogs, which are located in files catalogA and catalogB. Each exercise says which catalog(s) to use. The first step of each exercise should be to open the appropriate catalog, which you do by choosing Open from the Catalogs menu of the visualization tool. Then you enter your query by choosing Enter SQL from the Query menu of the visualization tool.
If you want copies of the catalog files (not necessary; we can all share the
same one), you can copy them. The files to copy, if you so desire, are:
catalog-format.txt: A description of the format of the catalogs.
catalogA, catalogB: Catalog files.
All three files are in the directory
/projects/instr/cse594/mini_hwk/assign/Optimize.
Note that even if you copy these, minibaseview will use the shared ones
by default.
Catalogs A and B are based on the same schema. There are three relations, Emp, Dpt, and Job, which represent employees, departments, and job titles of some organization. Each of these relations has an integer field which is used as the primary key for the relation---empid for the Emp relation, dno for the Dpt relation, and jno for the Job relation. The Emp relation has foreign keys referring to Dpt and Job, representing each employee's department and job title, and the Dpt relation has a foreign key referring to Emp, representing the department's manager.
Open Catalog A and run the tool on the following query.
select * from dpt
where mgrid = 20223
Answer the following questions, looking at level 0 access methods.
Level 1 of the optimizer visualization tool shows plans that consist of walking a single relation using one of the access methods defined for that relation. If you double-click the LEFT mouse button on a level 1 plan (and higher levels too), its box expands to show you more detailed information about the plan and the estimates made about it.
Use the same catalog, Catalog A, and the following query:
select * from emp
where sal = 58000
Answer the following questions, looking at level 1 plans.
Use the same catalog, Catalog A, and the following query:
select * from emp
where ename = "Frank"
Once again, open Catalog A and run this query. Answer the following questions, looking at both Level 1 plans and Level 0 access methods.
select * from emp
where empid < 25000
Answer the following questions.
select ename, dname from emp, dpt
where emp.dno = dpt.dno and emp.jno < 75
Answer the following questions, looking at the best plan.
Open Catalog A, and run the following query.
select * from emp, job
where emp.jno = job.jno
Answer the following questions looking at Level 2 plans. (NOTE: be sure not to confuse LEVEL 2 with the line NUMBERED 2. The NUMBER of the line of plans you are to look at is 3, the top line of the display.)
select ename, dname from emp, dpt, job
where emp.dno = dpt.dno and emp.jno = job.jno
and dpt.dname = job.jname
Also, if you double-click the RIGHT mouse button on the box that labels any row of the tool's display, you will turn off (and back on) the pruning of plans that the optimizer estimates are not the `least expensive'. (Remember that for each subset of relations, for each ordering of generated tuples, only the least cost plan is retained. The toggling referred to here controls this pruning; it should be used with care since the number of retained plans can increase rapidly without pruning!)
Run the following variant of the query from the previous exercise, using Catalog B as before.
select ename, dname from emp, dpt
where emp.dno = dpt.dno and emp.jno < 100
Now suppose you knew that the run-time environment for this query was limited to 10 pages of memory. Find the best (left-deep) plan that fits this requirement and describe its join method, access methods, and total cost.