CSE 414 Homework 2: SQL (basic)

 
Objectives:
To create and import databases and to practice simple SQL queries.
Assignment tools:
sqlite3, Flights dataset (.tar.gz archive)
(Reminder: To extract the content of a tar file, try the following command: tar zxvf flights-small.tar.gz)
Due date:
Tuesday, April 12 by 11:00pm. Turn in your solution using the assignment drop box linked from the main course web page.
What to turn in:
create-tables.sql and hw2-queries.sql (see below)

In this homework, you will write several SQL queries on a relational flights database. The data in this database is from the Bureau of Transportation Statistics The database consists of four tables:

FLIGHTS (fid, year, month_id, day_of_month, day_of_week_id, carrier_id, flight_num, origin_city, origin_state, dest_city, dest_state, departure_delay, taxi_out, arrival_delay, canceled, actual_time, distance)
CARRIERS (cid, name)
MONTHS (mid, month)
WEEKDAYS (did, day_of_week)

We leave it up to you to decide on the types to use for the various fields.

In addition, impose the following constraints:

We provide the flights database as a set of plain-text data files in the linked .tar.gz archive. Each file in this archive contains all the rows for the named table, one row per line.

In this homework, you need to do two things: (A) import the flights dataset into SQLite and (B) run SQL queries to answer a set of questions about the data.

A. IMPORTING THE FLIGHTS DATABASE (5 points):

To import the flights database into SQLite, you will need to run sqlite3 with a new database file: for example "sqlite3 hw2.db". Then you can use the CREATE TABLE SQL statement to create the tables, choosing appropriate types for each column and specifying all key constraints as described above:

  CREATE TABLE table_name (  . . . );

Currently, SQLite does not enforce foreign keys by default. To enable foreign keys use the following command. The command will have no effect if your version of SQLite was not compiled with foreign keys enabled. Do not worry about it.

  PRAGMA foreign_keys=ON;

Then, you can use the SQLite .import command to read data from each text file into its table:

  .import filename tablename

See examples of .import statements in the lecture notes, and also look at the SQLite documentation or sqlite3's .help for details.

Put all the code for part A (four create table statements and four .import statements) into a file called create-tables.sql .

B. SQL QUERIES (15 points):

HINT: You should be able to answer all the questions below with SQL queries that do NOT contain any subqueries!

For each question below, write a single SQL query to answer that question. Put your queries in a file called hw2-queries.sql. Add a comment to each query indicating the question it answers and the number of rows in the query result.

Important: The predicates in your queries should correspond to the English descriptions. For example, if a question asks you to find flights by Alaska Airlines Inc., the query should include a predicate that checks for that specific name as opposed to checking for the matching carrier ID. Same for predicates over months, weekdays, etc.

If a query uses a GROUP BY clause, make sure that all attributes in your SELECT clause for that query are either grouping keys or aggregate values. SQLite will let you select other attributes but that is wrong. Other database systems would reject the query in that case.

  1. List the distinct flight numbers of all flights from Seattle to Boston by Alaska Airlines Inc. on Mondays. Also notice that, in the database, the city names include the state. So Seattle appears as Seattle WA.

    [Output relation cardinality: 3 rows]
  2. Find all flights from Seattle to Boston on July 15th 2015. Search only for itineraries that have one stop. Both legs of the flight must have occurred on the same day and must be with the same carrier. The total flight time (actual_time) of the entire itinerary should be less than 7 hours (but notice that actual_time is in minutes). For each itinerary, the query should return the name of the carrier, the first flight number, the origin and destination of that first flight, the flight time, the second flight number, the origin and destination of the second flight, the second flight time, and finally the total flight time.

    Put the first 20 rows of your result right after your query as a comment.

    [Output relation cardinality: 488 rows]
  3. Find the day of the week with the longest average arrival delay. Return the name of the day and the average delay.

    [Output relation cardinality: 1 row]
  4. Find the names of all airlines that ever flew more than 1000 flights in one day. Return only the names. Do not return any duplicates.

    [Output relation cardinality: 11 rows]
  5. Find all airlines that had more than 0.5 percent of their flights out of Seattle be canceled. Return the name of the airline and the percentage of canceled flight out of Seattle. Order the results by the percentage of canceled flights in ascending order.

    [Output relation cardinality: 6 rows]
Put all your code for part B (SELECT-FROM-WHERE code) in a file called hw2-queries.sql.

Save all files you create during this homework: you will need them in later homeworks.