Lecture 24 - SQL: Multi-Table Queries (JOINS)

Agenda

Reminder: WPL 2:30-4:30 today only on the 5th floor breakout of CSE

HW 6 Introduction

Lectures left:

  • Monday: RegEx
  • Wednesday: Cookies
  • Friday: Sessions
  • Wednesday: What's next!
  • Final Review

SQL so far

Database querying

Database manipulation (create/insert)

Accessing SQL from PHP with the PDO object

Multi-Table Queries (JOINS)

joins

XKCD 1810

Related tables and keys

table keys

  • Primary Key: a column guaranteed to be unique for each record (e.g. Lisa Simpson's ID 888)
  • Foreign Key: a column in Table A storing a primary key from table B
    • (e.g. records with grades with student_id)
  • Normalizing: Splitting tables to improve structure/redundancy (linked by unique IDs)
    • This is covered more in depth in database courses (e.g. CSE 344 (CSE majors) and CSE 414 (non-CSE-majors)) and Info 330.

JOIN Template


          SELECT col(s)
          FROM table1, table2, ...
          WHERE table1.a = table2.b
          AND table2.c > '42';
          

SQL (template)


          SELECT students.name, grades.*
          FROM students, grades
          WHERE students.id = grades.student_id
          AND grades.grade > 'C';
          

SQL (example)

join example

Result table

Giving Names to Tables


          SELECT students.name, grades.*
          FROM students, grades
          WHERE students.id = grades.student_id
          AND grades.grade > 'C';
          

SQL (example from previous slide)

A more compact solution (giving variable names to tables)


          SELECT s.name, g.*
          FROM students s, grades g
          WHERE s.id = g.student_id
          AND g.grade > 'C';
          

SQL (alternative example)

  • You can give names to tables, like a variable in Java.
  • To specify all columns in a table, write table.*

Note: An Equivalent Way to JOIN Tables


          SELECT s.name, g.*
          FROM students s, grades g
          WHERE s.id = g.student_id
          AND g.grade > 'C';
          

SQL (example from previous slide)


          SELECT s.name, g.*
          FROM students s
          JOIN grades g ON s.id = g.student_id
          WHERE g.grade > 'C';
          

SQL (alternative example, using the JOIN keyword)

The JOIN keyword is another way to join multiple tables. Some people find this more intuitive, while others find joining tables on multiple WHERE conditions more intuitive. Use whichever form you prefer most!

Note: There are other types of JOINS you may find on the recommended SQL query practice sites. You may learn these for fun if you'd like, but we don't cover it in this class (you will learn about them in other database courses). The JOIN keyword in this slide's example is also known as an "INNER JOIN"

A Suboptimal Query

Exercise: What courses have been taken by both Bart and Lisa?


          SELECT bart.course_id
          FROM grades bart, grades lisa
          WHERE lisa.course_id = bart.course_id
          AND bart.student_id = 123
          AND lisa.student_id = 888;
        

SQL

What's wrong here?

Improved Query

What courses have been taken by both Bart and Lisa?


          SELECT DISTINCT c.name
          FROM courses c, students lisa, students bart, grades g1, grades g2
          WHERE g1.course_id = c.id
          AND g1.student_id = bart.id
          AND g2.course_id = c.id
          AND g2.student_id = lisa.id
          AND bart.name = 'Bart'
          AND lisa.name = 'Lisa';
          

SQL

Why so many tables?

Practice Queries

Click for solutions

What are the names of all teachers Bart has had?

How many total students has Ms. Krabappel taught, and what are their names?

Solutions without JOINs

What are the names of all teachers Bart has had?


  SELECT t.name
  FROM teachers t, students s, grades g, courses c
  WHERE s.name="Bart"
  AND s.id=g.student_id
  AND g.course_id=c.id
  AND c.teacher_id=t.id
              

SQL

How many total students has Ms. Krabappel taught, and what are their names?


  -- COUNT(s.name) gives the count
  SELECT s.name
  FROM students s, grades g, courses c, teachers t
  WHERE t.name="Krabappel"
  AND t.id=c.teacher_id
  AND c.id=g.course_id
  AND g.student_id=s.id;
              

SQL

Designing a Query (Refer to this slide often!)

Figure out the proper SQL queries in the following way:

  • Which table(s) contain the critical data? (FROM)
  • Which columns to I need in the result set? (SELECT)
  • How are tables connected (JOIN and/or WHERE) and values filtered (WHERE)?
  • Do I need to return only DISTINCT records?
  • Do I care about the order of records returned? If so, which columns do I need to sort by and in what precedence?

Example Database (also on query tester)

imdb schema

  • imdb-small is also provided on the query tester page for testing queries with a smaller dataset

IMDB Table Relationships/ids

imdb schema

IMDB Practice Queries

What are the names of all movies released in 1995?

How many people played a part in the movie "Lost in Translation"?

What are the names of all the people who played a part in the movie "Lost in Translation"?

Who directed the movie "Fight Club"?

How many movies has Clint Eastwood directed?

What are the names of all movies Clint Eastwood has directed?

What are the names of all directors who have directed at least one horror film?

What are the names of every actor who has appeared in a movie directed by Christopher Nolan?

What are the names of all movies released in 1995?


SELECT name
FROM movies
WHERE year = 1995;
            

SQL

How many people played a part in the movie "Lost in Translation"?


SELECT COUNT(*)
FROM actors a
JOIN roles r ON a.id = r.actor_id
JOIN movies m ON r.movie_id = m.id
WHERE m.name="Lost In Translation";
            

SQL

What are the names of all the people who played a part in the movie "Lost in Translation"?


SELECT first_name, last_name
FROM actors a
JOIN roles r ON a.id = r.actor_id
JOIN movies m ON r.movie_id = m.id
WHERE m.name="Lost In Translation";
            

SQL

Who directed the movie "Fight Club"?


SELECT d.first_name, d.last_name
FROM directors d
JOIN movies_directors md ON md.director_id = d.id
JOIN movies m ON m.id = md.movie_id
WHERE m.name="Fight Club";
            

SQL

How many movies has Clint Eastwood directed?


SELECT COUNT(*)
FROM movies m
JOIN movies_directors md ON md.movie_id = m.id
JOIN directors d ON md.director_id = d.id
WHERE d.first_name = "Clint"
AND d.last_name = "Eastwood";
            

SQL

What are the names of all movies Clint Eastwood has directed?


SELECT m.name
FROM movies m
JOIN movies_directors md ON md.movie_id = m.id
JOIN directors d ON md.director_id = d.id
WHERE d.first_name = "Clint"
AND d.last_name = "Eastwood";
            

SQL

What are the names of all directors who have directed at least one horror film?


SELECT DISTINCT d.first_name, d.last_name
FROM directors d
JOIN movies_directors md ON md.director_id = d.id
JOIN movies_genres mg ON mg.movie_id = md.movie_id
WHERE mg.genre = "Horror";
            

SQL

What are the names of every actor who has appeared in a movie directed by Christopher Nolan?


SELECT DISTINCT a.first_name, a.last_name
FROM actors a
JOIN roles r ON a.id = r.actor_id
JOIN movies m ON m.id = r.movie_id
JOIN movies_directors md ON md.movie_id = m.id
JOIN directors d ON md.director_id = d.id
WHERE d.first_name = "Christopher"
AND d.last_name = "Nolan";
            

SQL

More information on JOINS