Homework 3: JDBC, Updates and Transactions

ESTIMATED TIME: 12 hours.

HOW TO TURN IN: Submit your files using this Catalyst Dropbox . The due date is May 2, 11:45 pm.

SOFTWARE: Please Install Software that you will need for this homework.

JDBC DRIVERS: In order to connect to a database from Java, you need database-specific JDBC drivers on your classpath. You will need two JDBC drivers, one for Postgres and the other for SQL Server (follow the instructions on the Installing Software page to download them). After downloading these jar files, you need to modify your CLASSPATH and include them there. On Windows machines, go to Control Panel -> System -> Advanced. Click on the Environment Variables button and choose the CLASSPATH to edit. Make sure you include the complete paths for these two jar files. The path should be to the files themselves, not just the directory containing them. Alternatively, you can download the CSEP544 shell launcher script and modify it to reflect the location of these jar files on your machine.

POSTGRES: Instructions on setting up PostgreSQL are here.

STARTER CODE: Project2.java, Query.java Dowload them to your local directory. You will need to modify Query.java (you can modify Project2.java if you want to, but it's not needed for the project.)

RUNNING THE STARTER CODE: Create the 'CUSTOMER' database on PostgreSQL. Then run the following in a Windows shell.You may choose to work in a Java IDE like Eclipse instead of working from the command-line as shown below.

javac Project2.java
javac Query.java
java Project2 uid passwd

(The starter code does not authenticate the user, so uid and passwd are ignored: one of your jobs is to modify the starter code to do the authentication.) At this point you should see this:

 *** Please enter one of the following comands *** 
> search <movie title>
> plan [<plan id>]
> rent <movie id>
> return <movie id>
> fastsearch <movie title>
> quit
>

The command 'search' works (sort of). Try typing:

search Nixon

After a few seconds you should start getting movie titles containing 'Nixon', and their directors. (You don't yet get the actors: one of your jobs is to list the actors.)

PROJECT DESCRIPTION

Develop a simple video-on-demand application. This is a typical application using a database: the only thing we are missing is a Web interface. You will connect to two databases, one on Postgres, the other on SQL Server.

You sell video streams to customers. You have a contract with a content provider that has videos of all the movies in the IMDB database, and you resell this content to your customers. Your first task is to create a database of your customers. Next, your application allows the user to search the IMDB movie database (the starter code already does that), and to rent movies (which we assume are delivered by the content provider; we don't do this part in the project). Once a customers rents a movie, she can watch it as many times as she wants, until she decides to "return" it to your store. You need to keep track of which customers are currently renting which movies.

There are two important restrictions:

Your contract with the content provider allows you to rent each movie to at most one customer at any one time: the movie needs to be first returned before you may rent it again to another customer (or the same customer).
Your own business model imposes a second important restriction: your store is based on subscription (much like Netflix), that allows customers to rent up to a maximum number of movies. Once they reach that number you will deny them more rentals, until they return a movie. You offer a few different rental plans, each with its own monthly fee and maximum number of movies.

Task 1

Design and create the CUSTOMER database in Postgres.

What to turn in: a script file with CREATE TABLE and with INSERT statements for this database.

The database has the following entity sets:

E1. Customer: a customer has an id (integer), a login, a password, a first name and a last name.

E2. Plan. Each plan has a plan id (integer), a name ("Basic", "Rental Plus", "Super Access" or a new plan of your choice), the maximum number of rentals allowed ("Basic" allows one movie; "Rental plus" allows three; "Super Access" allows five), and the monthly fee (set this to a value of your choice for each).

E3. Rental: a "rental" entity represents the fact that a movie was rented by a customer. The movie is identified by a movie id (from the IMDB database). The rental has a status that can be open, or closed. When a customer first rents a movie, then you create an open entry in Rentals; when he returns it you updated it to closed (you don't delete it). Keeping the rental history helps you improve your business by doing data mining (but we don't do this in this class.)

In addition there are the following relationships:

R1. Each customer has exactly one rental plan

R2. Each rental refers to exactly one customer. (It also refers to a single movie, but that's in a different database, so we don't model that as a relationship.)

Create a script file with CREATE TABLE statements, and with INSERT statements that populate each table with a few tuples (say 2-8 tuples): you will turn in this script file. Write a separate script file with DROP TABLE statements: it's useful to run it whenever you find a bug in your schema or data (don't turn in this file).

Task 2

Write the java application, by completing the starter code.

What to turn in: the two java files Project2.java and Query.java and the corresponding class files.

The application is a simple command-line java program. A "real" application will have a Web interface instead, but that requires a lot of programming that is unrelated to database content of this course. Your java application needs to connect to two databases: the IMDB database on IPROJSRV, and your own CUSTOMER database on your local machine running PostgreSQL.

When your application is started it reads a user name and password from the command line. It validates them against the database, then retains the user id throughout the session. All rentals/returns are on behalf of this single customer: to change the customer you will quit the application and restart it with another customer. Much of this logic is already provided in the starter code.

Once the application is started, the user can select one of the following transactions. (We call each action a transcation: some of them you really need to write as SQL transactions, others are interactions with the database that do not require transactions.)

The "search" transaction: the user types in a string, and you return:
- all movies whose title matches the string
- their director(s) (often there are more than 1 directors)
- their actors
- an indication whether the movie is available for rental (remember that you can rent the movie to only one customer at a time), or whether the movie is already rented by this customer (some customers forget: be nice to them), or whether it is unavailable (rented by someone else).
The starter code already returns the movies and directors: you still need to retrieve and print the actors and the availability status.

In the starter code, the directors are obtained using a "dependent join" (sometimes called "nested loop join"). You should use the same technique for retrieving the actors: there is a more efficient way, but we'll implement it later.

WHAT YOU WILL LEARN in 1: (a) how to run SQL queries from Java, (b) what a dependent join is, and how easy it is to embed it in an application, (c) how slow the dependent joins are.
The "plan" transaction. Here the customer types in a plan id and you set his new plan to this plan id. How does the customers know what plan id's are available ? They type in "plan" without any plan id, and then you will list all available plans, their names, and their terms (maximum number of movies available for rental and monthly fees).

WHAT YOU WILL LEARN in 2: simple database updates from Java.
The "rent" transaction. The user types in a movie id, and you will "rent" that movie to the customer.
The "return" transaction. The user types in the movie id to be returned. You update your records to mark the return.

WHAT YOU WILL LEARN in 3 and 4, and also in 2: the typical interaction between the application and the database; basic data integration (you integrate CUSTOMER with IMDB); the need for transactions and rollback.
The "fastsearch" function. Here you will fix the performance problem in the "search" function: "fastsearch" has exactly the same functionality as "search" (except that it doesn't have to return the availability status), but where all the dependent joins are replaced with regular joins, executed in the database engine.

HINT: add ORDER BY to each of the three queries so you can merge-join the three streams efficiently.

NOTE: this task requires a little bit more creativity (and more work) than each of the others, but it is still easy.

NOTE 2: Depending on your setting, "fastsearch" may actually run slower than regular "search": if that happens for the right reason, then you are OK. In "fastsearch" you are running upfront three SQL queries, which are more complex that the single SQL query executed in "search": this initial step will take longer. Subsequently however, "fastsearch" no longer issues queries, and interates over all movies, actors, and directors exactly once, while search needs to issue two new SQL queries for each movie (not counting the computation of the availability status). Thus, the time it takes to the first answer may be longer in "fastsearch", but subsequent answers should be returned "fast".

WHAT YOU WILL LEARN in 5: the database system is much better at performing the join than the java application.

In addition you have to provide a minimal amount of user-friendliness: at each iteration of the main loop you will print the current customer's name, and tell them how many additional movies they can rent (given their current plan and the number of movies that they have already rented). WHAT YOU WILL LEARN: give the user as much data as you can, but not more.

COMMENTS:

The starter code is designed to give you a gentle introduction into embedding SQL in Java. Start by running the starter code, examine it and make sure you understand the part that works. Then create your customer database, insert 1-2 customers, and uncomment the few lines of code in the started code that do user authentication and that greet the user with her/his name and plan details. This should all work flawlessly (if not talk to me or to your TA). Then, complete the first (the "search") transaction, i.e. return the actors. Then continue with the other function.
The completed project has about 20, quite simple SQL queries embedded in the Java code. Some queries are parameterized: a parameter is a constant that is known only at runtime, and therefore appears as a '?' in the SQL code in JAVA: you already have examples in the started code.
You need to enforce the following constraints:

C1. at any time a movie can be rented to at most one customer.

C2. at any time a customer can have at most as many movies rented as his/her plan allows.

What this means is that (a) when a customer requests to rent a movie you may deny this request and (b) when a customer selects a "lower" plan (with fewer allowed movies) you may deny this request (why ?). Thus, you need to worry about C in ACID.

HINT: Transactions and ROLLBACK are your friends.
You also need to worry about concurrency. A user may try to cheat and coerce your application to violate the constraint C2 above by running two instances of your application in parallel, with the same user id: depending on how you write your application and on race conditions, the malicious user may succeed in renting more movies than he/she is allowed. You need to ensure that each instance of your application runs in isolation, i.e. the I in ACID.

HINT: Transactions are your friends again.
As a student concerned about your grade in P544, you also need to worry about the A in ACID: what if you lose your CUSTOMER database, or it becomes inconsistent as a result of a systems crash. You won't use recovery, however. Instead you will rely on the script file that you need to prepare for Task 1. This file contains all the CREATE TABLE and INSERT statements that are needed to start your project.
Never include a user interaction inside a transaction ! That is, don't begin a transaction then wait for the user to decide what she wants to do (why ?). Your transactions should not include any user interactions.
You need transactions only on the CUSTOMER database: you don't need transactions for IMDB, since this database is never updated.