Get the dataset. Unzip it to find a folder containing 5 CSV files.
flights.csv
carriers.csv
cancellation_codes.csv
n_numbers.csv
aircraft_types.csv
Here are the columns in each table, their types, and a brief description.
The flights table has one row for every marketed domestic commercial flight in the United States in September, 2024. There are approximately 500,000 rows.
flights (
year INT,
month INT, -- 1-12
day_of_month INT, -- 1-31
day_of_week INT, -- 1-7, 1 = Mon
op_unique_carrier TEXT, -- the code of the airline, see carriers table
tail_num TEXT, -- the "license plate" of the aircraft
op_carrier_fl_num INT,
origin TEXT, -- airport code where flight started
origin_city_name TEXT,
origin_state_abr TEXT,
dest TEXT, -- airport code of destination
dest_city_name TEXT,
dest_state_abr TEXT,
crs_dep_time TEXT, -- scheduled departure time
dep_time TEXT, -- actual departure time
dep_delay REAL,
crs_arr_time TEXT, -- scheduled arrival time
arr_time TEXT, -- actual arrival time
arr_delay REAL,
cancelled REAL, -- 1 if flight was cancelled, else 0
cancellation_code TEXT, -- reason for cancellation, or empty
-- see cancellation_codes table
diverted REAL, -- 1 if the flight was diverted, else 0
actual_elapsed_time TEXT, -- departure-to-arrival time in minutes
distance REAL -- miles between origin and destination
)
The carriers table is a table of airlines and their names, roughly 2000 records.
Each carrier has a unique carrier code which is used by op_unique_carrier
column of the flights table above to describe which airline flew the flight.
carriers(
cid TEXT, -- the "code" of the airline
cname TEXT -- the long name of the airline
)
The cancellation codes table includes four short codes for possible reasons a
flight would be cancelled. Used by the cancellation_code
column of the flights
table.
cancellation_codes(
cancellation_code TEXT, -- the code
description TEXT -- why the flight was cancelled
)
The N-numbers table includes one row for every publicly registered aircraft in the United States as of January 2025, around 300,000 records. An "N-number" is like a license plate number for aircraft in the United States. All of them start with "N" (which is therefore omitted from the table -- it is implicit). The "N" is the unique country code assigned to the United States under international agreements regarding aircraft.
n_numbers (
n_number TEXT, -- the "license plate number" without the N
serial_number TEXT,
mfr_mdl_code TEXT, -- manufacturer and model code, see aircraft_types
eng_mfr_mdl TEXT,
year_mfr TEXT,
name TEXT, -- name of entity that registered the aircraft
street TEXT, -- their address is stored in the rest of the columns
street2 TEXT,
city TEXT,
state TEXT,
zip_code TEXT,
region TEXT,
county TEXT,
country TEXT
)
The aircraft types table contains one row for each model of aircraft ever
registered, including the manufacturer. There are around 100,000 records. It is
used by the mfr_mdl_code
column of the N-number table to describe the type of
each registered aircraft.
aircraft_types (
code TEXT, -- the code, used by the n_numbers table
mfr TEXT, -- name of the manufacturer
model TEXT, -- name of the model
no_eng INT, -- number of engines on this type
no_seats INT, -- maximum number of seats on this type
ac_weight TEXT, -- weight class (1 through 4)
speed INT -- average cruising speed of this type in miles per hour
)
The tables are subject to the following additional constraints, which you should enforce:
flights
has no primary key (!)cid
for carriers
cancellation_code
for cancellation_codes
n_number
for n_numbers
code
for aircraft_types
flights.op_unique_carrier
references carriers.cid
n_numbers.mfr_mdl_code
references aircraft_types.code
flights.n_number
is not a foreign key to
the n_numbers
table. Nor is flights.cancellation_code
a foreign key to
cancellation_codes
.Import the data.
To import the flights dataset into a SQLite database, you will first need to
run sqlite3
with a new database file (eg, sqlite3 hw2.db
). This will
create an empty, file-backed database that you can load in future SQLite
sessions.
Once you have an empty database, you can run CREATE TABLE
statements to
create the tables while specifying all key constraints as described above.
CREATE TABLE table_name ( ... );
You will need one CREATE TABLE
statement per CSV file in the provided data.
Currently, SQLite does not enforce foreign keys by default. To enable foreign keys, use the following command.
PRAGMA foreign_keys=ON;
Next, use SQLite's .import
command to read data from each downloaded text
file. You will use one file per table, and you will need to set the input
format to CSV (comma-separated value). To learn how to use the .import
command, please read the
documentation. Something like this should get you started:
.mode csv
.import filename tablename
You will need one .import
command per CSV file in the provided data.
Put all the SQLite commands and statements for creating your tables and
enforcing foreign keys into a file called create-tables.sql
, and all the
code for importing the data into these tables into a separate file called
import-tables.sql
. If done correctly, you should be able to open up a new,
empty .db
file in SQLite and setup the database using these two commands:
.read create-tables.sql
.read import-tables.sql
(SQL queries with joins) Write SQL queries to answer the following questions. Your queries should work on any data with the same schema as the provided data. In other words, do not assume that your query will only be tested on the provided data. (For example, the provided data consists only of flights in September 2024, but your queries should not assume that.)
Put each query in its own file called hw2-q2a.sql
, hw2-q2b.sql
, etc.
For development purposes, we recommend writing all your queries in one file using a SQL IDE (link to software setup instructions for choosing an IDE). You can run, test, and debug your queries in the IDE after connecting the IDE to your database. Then, split your queries into separate text files files before submitting your homework.
You should only use the parts of SQL we have covered in class (with the
exception that you may use any operators/functions from the documentation
of SQLite). Do not use grouping,
aggregation, windowing, compound queries, subqueries, or WITH
. (If you
don't know what those are, stay tuned for future homeworks!)
Remember that the topic of this homework is joins. If you're stuck, try figuring out how a join might help you.
Unless otherwise specified, you may name the output columns however you like, and the rows may be in any order.
List the distinct carrier names of all carriers that offer a flight from
the Seattle airport (code SEA
) to any airport in Chicago. (Warning:
there is more than one airport in Chicago. Your query should work even
if they build yet another airport in Chicago.) Name the output column
cname
.
List all itineraries on September 15, 2024, from the Seattle airport
(code SEA
) to any airport in Chicago that have exactly one
intermediate stop (i.e. SEA -> somewhere -> some Chicago airport).
Both flights must depart on the same day and must be with the same
carrier, and the first flight must depart (dep_time
) before the
second flight departs. (Do not worry about whether the first flight
arrives before the second flight departs, because that is a bit tricky
with this data due to overnight flights.) The total flight time
(actual_elapsed_time
) of the entire itinerary should be fewer than 480
minutes (i.e., 8 hours).
For each itinerary, your query should return the name of the carrier
(not the code but the name), the first flight number, the first flight
time (actual_elapsed_time
), the name of the intermediate city (not the
airport code but the city name), the second flight number, the second
flight time, and finally the total flight time. Only count flight times
here; do not include any layover time.
Name the output columns as follows: carrier_name
, f1_flight_num
,
f1_time
, intermediate_city
, f2_flight_num
, f2_time
, and total
time
(with a space in its name!) in that order.
List the distinct pairs of carrier name and origin city name where it is
possible to take a flight of at least 3000 miles from that city on that
carrier. Name the output columns cname
and origin_city_name
in that
order.
Find all aircraft (identified by tail_num
) that departed the same
airport (identified by the airport code) twice on the same day to
different destination airports where the two flights totaled at least
5000 miles.
Output columns tail_num
and origin
. Do not include duplicate rows.
Find all registered aircraft (in the N-numbers table) owned by
'UNIVERSITY OF WASHINGTON'
along with the type of aircraft (from the
aircraft types table).
Output columns n_number
, name
, mfr
, and model
.
The tail_num
column of the flights table (usually) starts with an N,
while the n_number
column of the n_numbers
table (always) does not
include the N. Joining these tables on these columns will be more
challenging than usual. To make it simpler, your query may assume that
tail_num
in the flights table always starts with an N, even though
this is not true.
Find all aircraft that were manufactured in 2024 and took a flight from the city in which they are registered.
Output columns n_number
, name
, and city
from the N-numbers table.
Do not include duplicates.
Hint: to compare the flight city to the registration city, you may find
the ||
operator to concatenate
strings useful. (That link describes all operators/expressions. Search
for "concat" on that page.) For example 'hello' || 'world'
evaluates
to 'helloworld'
. You'll also potentially need to account for uppercase
versus lowercase. Check the SQLite documentation to find useful
functions to help you.
Find all flights from 'Tampa, FL'
on a 737 that were canceled due to
weather.
For the purposes of this problem, consider "a 737" to be any aircraft
that has '737'
as a substring of its aircraft type's model
column.
(You may want to look at the LIKE
operator
to find substrings.)
As in the previous problem, you may assume that tail_num
in the
flights table always starts with an N, even though that is not true.
Output columns year
, month
, day_of_month
, cname
(the carrier
name), and flight_num
.
Turn in each subproblem in its own hw2-q2a.sql
file named exactly
according to this scheme.
(Java) Convert your solution to problem 2c to a parameterized query using JDBC to connect to the SQLite database you constructed in problem 1.
Get the starter code. Put your answer in HW2.java
in the problem3
method. The argument to this method is the distance that
replaces 3000
in problem 2c.
Use SQL query parameters. Do not manipulate query strings using concatenation.
Turn in only this file for this problem. The starter code should work out of
the box with VSCode. We have also provided a Makefile to run the provided
tests on the command line using make
.