Assignment 1: SQL Queries
This assignment requires you to formulate SQL queries over a relational
movie database, populated with data from the IMDB website. The database is named IMDB, and is loaded on the
PostgreSQL 8.0.7 server running on cubist.cs.washington.edu. You will
need to log into cubist, and run psql,
the command-line client for talking to the database server. You will
also need to specify the name of the database you want to use, and your
PostgreSQL user name. So, user xyz
will run the following command:
/usr/local/pgsql/bin/psql IMDB -U xyz
Additional documentation on PostgreSQL can be found here.
Homework Description
The database schema is shown below:
- ACTOR (id, fname, lname, gender)
- MOVIE (id, name, year, rank)
- DIRECTOR (id, fname, lname)
- CASTS (pid, mid, role)
- MOVIE_DIRECTOR (did, mid)
The id column in ACTOR, MOVIE & DIRECTOR tables is a key
for the respective table. CASTS.pid is a foreign key referring to
ACTOR.id, CASTS.mid refers to MOVIE.id, MOVIE_DIRECTOR.did refers to
DIRECTOR.id, and MOVIE_DIRECTOR.mid refers
to MOVIE.id. Write SQL queries for the following:
- (8') a. List all the actors who acted in at least one film in 2nd
half
of the 19th century and in at least one film in the 1st half of the
20th century.
b. List all the directors who directed a film in a leap year.
- (8') List all the movies that have the same year as the movie
'Shrek', but a better rank (Note: bigger value of rank implies a
better rank).
- (8') List first name and last name of all the actors who played
in the
movie 'Officer 444'.
- (8') List all directors in descending order of the number of
films
they directed.
- (8') a. Find the film(s) with the largest cast.
b. Find the film(s)
with the
smallest cast.
In both cases, also return the size of the cast. List the largest and
smallest cast sizes you obtained in your submitted solution.
- (8') Find all the actors who acted in films by at least 10
distinct
directors (i.e. actors who
worked with at least 10 distinct directors).
- (8') Find all actors who acted only in films before 1960.
- (8') Find the films with more women actors than men.
- (8') For every pair of male and female actors that appear
together in
some film, find the total number of films in which they appear
together. Sort the answers in decreasing order of the total number of
films.
- (8') For every actor, list the films he/she appeared in their
debut
year. Sort the results by last name of the actor.
- (10') The Bacon number of an actor is the length of the
shortest path between the actor and Kevin Bacon in the "co-acting"
graph. That is, Kevin Bacon has Bacon number 0; all actors who acted in
the same film as KB have Bacon number 1; all actors who acted in the
same film as some actor with Bacon number 1 have Bacon number 2, etc.
Return all actors whose Bacon number is
2.
Bonus: Suppose
you
write a single SELECT-FROM-WHERE SQL query that returns all actors
having finite
Bacon numbers. How big is the query?
- (10') A decade is a sequence of 10 consecutive years. For
example 1965, 1966, ..., 1974 is a decade, and so is 1967, 1968, ...,
1976. Find the decade with the largest number of films. List the answer
you obtained in your submitted solution.
- Bonus: Rank the actors
based on their popularity, and compute a list of all actors in
descending order of their popularity ranks. You need to come up
with your own metric for computing the popularity ranking. This
may include information such as the number of movies that an actor has
acted in, the ranks of these movies, the 'popularity' of these movies'
directors (where the
directors' popularity is the number of movies they have directed),
etc. Be creative in how you choose your criteria of computing the
actors' popularity. For this answer, in addition to the
query, also turn in the criteria you used to rank the actors.
Due
Date: Apr 10. Email a text file containing ur
ready-to-run queries to Bhushan.
The Sample Solution is
now available.