Assignment 1: SQL Queries

Objectives:

To understand and be able to write simple SQL queries.

Reading Assignments:

6.1 - 6.6, 7.1

Number of points:

100

Due date:

Wednesday, Jan. 18 in class at 12:20 pm.

Tools for the assignment

For this assignment, you will use your SQL Server account on IISQLSRV. You should have an account, and you know your password from the first lecture.

As an alternative to SQL Server, you are allowed to use a different database, e.g. MySQL, Postgres, or SQL Server installed in your own machine. In that case you fist need to import the movie database in your database system: you can obtain the data from here. There is no support for database systems other than IISQLSRV.

What to turn in:

Note: A minor modification is made for the turn-in format as follows. Please check!  Thanks for your cooperation :-)    

You need to turn in a printout containing the SQL query for each of the questions below, and indicate how many rows are returned by the SQL query, at the end of class on the due date. In addition, you also need to email me a single file containing all your queries, before 11:59pm on the due day. Your file should look like this. The name of your file will be your SQL server login id and the extension of the file will be .sql. For example, if Professor Suciu were to turn in his file, the file will be named suciu.sql because his login id on SQL server is suciu. Note that if for any reason, you have to write some notes or comments for me, please add them on the print out that you submit. The file that you email me should strictly follow the format as indicated to simply the grade process, otherwise it will impact your performance.
Note :
(1) You DO NOT have to turn in the results you get from executing your queries.
(2) If you use alternatives, e.g. MySQL, please ensure the queries you email me are execuable on MS SQL Server (e.g. on IISQLSRV) .

Homework Description

In this homework you are asked to write 12 SQL queries on a relational movie database. The data in this database is from the IMDB website. The database exists already on IISQLSRV. The database consists of seven tables, however you will be using only the following five tables
id column in ACTOR, MOVIE & DIRECTOR tables is a key for the respective table. 

CAST.pid refers to ACTOR.id, CAST.mid refers to MOVIE.id
MOVIE_DIRECTOR.did refers to DIRECTOR.id and MOVIE_DIRECTOR.mid refers to MOVIE.id 

  1. (8') a. List all the actors who acted in at least one film in 2nd half of the 19th century and in at least one film in the 1st half of the 20th century
    b. List all the directors who directed a film in a leap year
     
  2. (8') List all the movies that have the same year as the movie 'Shrek (2001)', but a better rank. (Note: bigger value of rank implies a better rank)
     
  3. (8') List first name and last name of all the actors who played in the movie 'Officer 444 (1926)'
     
  4. (8') List all directors in descending order of the number of films they directed
     
  5. (8')  a. Find the film(s) with the largest cast.  
    b. Find the film(s) with the smallest cast.
    In both cases, also return the size of the cast.
     
  6. (8') Find all the actors who acted in films by at least 10 distinct directors (i.e. actors who worked with at least 10 distinct directors).
     
  7. (8') Find all actors who acted only in films before 1960.
     
  8. (8') Find the films with more women actors than men.
     
  9. (8') For every pair of male and female actors that appear together in some film, find the total number of films in which they appear together. Sort the answers in decreasing order of the total number of films.
     
  10. (8') For every actor, list the films he/she appeared in their debut year. Sort the results by last name of the actor.
     
  11. (10') The Bacon number of an actor is the length of the shortest path between the actor and Kevin Bacon in the "co-acting" graph. That is, Kevin Bacon has Bacon number 0; all actors who acted in the same film as KB have Bacon number 1; all actors who acted in the same film as some actor with Bacon number 1 have Bacon number 2, etc. Return all actors whose Bacon number is 2.
    Bonus: Suppose you write a single SELECT-FROM-WHERE SQL query that returns all actors whose Bacon number is infinity. How big is the query?
    Note: The above "Bonus" problem is ill-stated. The correct one should be as follows:
    Suppose you write a single SELECT-FROM-WHERE SQL query that returns all actors who have finite Bacon numbers. How big is the query?
     
  12. (10') A decade is a sequence of 10 consecutive years. For example 1965, 1966, ..., 1974 is a decade, and so is 1967, 1968, ..., 1976. Find the decade with the largest number of films.
     
  13. Bonus: Rank the actors based on their popularity, and compute a list of all actors in descending order of their popularity ranks.  You need to come up with your own metric for computing the popularity ranking.  This may include information such as the number of movies that an actor has acted in; the 'popularity' of these movies' directors (where the directors' popularity is the number of movies they have directed), etc.  Be creative in how you choose your criteria of computing the actors' popularity.   For this answer, in addition to the query, also turn in the criteria you used to rank the actors.