This assignment requires you to formulate SQL queries over a relational movie database populated with data from the IMDB website. The database is named IMDB and loaded on the SQL Server running on iprojsrv.cs.washington.edu. You have all been given accounts on this database server with the account id being the same as your UW email address and the passwords as given in class. Please change your password immediately. See intructions under Software on the course Webpage on how to access the SQL Server on iprojsrv.cs.washington.edu.
(You may use a different database management system for this assignment, if you like to go off the beaten track. For example, you can use either a different installation of SQL server (e.g. on your machine), or some other DBMS (postgres, Oracle, DB2, whatever). In that case you can either import the data from iprojsrv, or you can get it from here. Comment at the top of queries.sql your alternate choice of system.
When using a different database management system, stick to basic SQL that is understood across most systems. Anything from lecture is fair game. Further questions on what can and cannot be used should be directed at Jessica. )
The database schema is shown below:
ACTOR (id, fname, lname, gender)
MOVIE (id, name, year, rank)
DIRECTORS (id, fname, lname)
CASTS (pid, mid, role)
MOVIE_DIRECTORS (did, mid)
GENRE (mid,genre)
The id column in ACTOR, MOVIE & DIRECTOR tables is a key for the respective table.
CASTS.pid refers to ACTOR.id
CASTS.mid refers to MOVIE.id
MOVIE_DIRECTORS.did refers to DIRECTORS.id
MOVIE_DIRECTORS.mid refers to MOVIE.id
GENRE.mid refers to MOVIE.id
Write SQL queries for the following. Unless otherwise specified, when asked to return all actors, directors, or movies, you should return their names (first and last respectively). Columns should be returned in the order listed.
List all the directors who directed a 'Film-Noir' movie in a leap year. (You need to check that the genre is 'Film-Noir' and year is divisible by 4.). Return the director name along with the movie name and movie year.
List the names of all the actors who played in the movie 'Officer 444' made in 1926. (movie is named 'Officer 444 (1926)' in the downloadable dataset)
List all the actors who acted in a film before 1900 and also in a film after 2000. (That is: < 1900 and > 2000.)
How can this be ? Actors can't live more than 100 years, right ? You are asked to find the explanation. For that you need to investigate a bit, perhaps run 1-2 additional queries. Once you identify one logical explanation why some actors appear in movies more than 100 years apart, please write it in your turnin, as a comment to the SQL query: keep you answer below 1-2 sentences.
List all directors who directed 150 movies or more, in descending order of the number of movies they directed. List the director names and the number of movies each directed
We want to find actors that played two or more roles in the same movie. Notice that Casts may have occasional dupliates, but we are not interested in these: we want actors that had two ore more distinct roles in the same movie.
(a) For each year, count the number of movies in that year that had only female actors. (b) Now make a small change: for each year, report the percentage of movies with only female actors made that year, and also the total number of movies made that year. For example, one answer will be:
1990 28.76 6098
1990 31.80 13522meaning that in 1990 there were 13522 movies, and 31.80% had only female actors.
Find the film(s) with the largest cast. Return the movie title and the size of the cast. By "cast size" we mean the number of distinct actors that played in that movie: if an actor played multiple roles, or if it simply occurs multiple times in casts we still count her/him only once.
A decade is a sequence of 10 consecutive years. For example 1965, 1966, ..., 1974 is a decade, and so is 1967, 1968, ..., 1976. Find and return the decade with the largest number of films.
Hint: this query combines several subtleties. One thing you may want to check is that the number of movies in that decade doesn't exceed the total number of movies in the database.
The Bacon number of an actor is the length of the shortest path between the actor and Kevin Bacon in the "co-acting" graph. That is, Kevin Bacon has Bacon number 0; all actors who acted in the same film as KB have Bacon number 1; all actors who acted in the same film as some actor with Bacon number 1 have Bacon number 2, etc. Return all actors whose Bacon number is 2.
To ponder: how can you compute the Bacon number of each actor in the database ?
These 9 queries are of increasing degree of complexity. Some of them are quite advanced. There are lots of hints for this homework in the lecture notes, and more may be given in class.