CSE 154

Lecture 24: Joins

Reminders and Administrivia

CP4 locks tonight at 11 pm PST

HW4 due next Wednesday by 11 pm PST (we are not charging late days)

Quiz 4 will be released today at 3 pm PST and is due tomorrow (Saturday) by 3 pm PST

Today's Agenda

  • Finish up SQL injection
  • Joins

SQL + User Input

Example

Suppose we had a POST endpoint to validate a user logging in:

let user = req.body.username;
let pw = req.body.password;
let qry = "SELECT * FROM users WHERE username='" + user +
        "' AND password='" + pw + "'";
// e.g. SELECT * FROM users WHERE username='foo' AND password='bar';
let result = await db.all(qry);
if (result.length === 0) {
res.type("text");
res.status(400).send("user not found");
} else {
res.json(result); // send the user info data (a single row) as JSON
}

JS

What if a user predicts we're using SQL with their parameters?

CONDITION_A OR CONDITION_B evaluates to TRUE in SQL if either condition is true. 1=1 always evaluates to TRUE. What if we have many users in the users database? What user data could be leaked in such a response?

let user = req.body.username; // "' OR 1=1 --"
let pw = req.body.password; // "oops :)"
let qry = "SELECT * FROM users WHERE username='" + user +
        "' AND password='" + pw + "'";
// e.g. SELECT * FROM users WHERE username='' OR 1=1 -- AND password='oops :)'
let result = await db.all(qry);
if (result.length === 0) {
res.type("text");
res.status(400).send("user not found");
} else {
res.json(result); // send the user info data (a single row) as JSON
}

JS

Improved Solution: ? Placeholders

let user = req.body.username;
let pw = req.body.password;
let qry = "SELECT * FROM users WHERE username=? AND password=?";
// the qry will be escaped internally and no rows will be returned
let result = await db.all(qry, [user, pw]);
if (result.length === 0) {
res.type("text");
res.status(400).send("user not found");
}

JS

Using ? placeholders in the db.all string, we not only avoid evaluating SQL with user-provided parameters, but we also have much cleaner query strings to work with.

The order of each ? placeholder in the query string must match the index of a passed array to the db.all's optional second argument.

db.run(sqlString, paramsArray)

The run function executes a single SQL query, sqlString, and replaces placeholder values (like "?"), with the values from the paramsArray. It has no return value (i.e., the promise has no resolved value).

let sql = 'INSERT INTO pokedex (name, type) VALUES (?, ?)';
await db.run(sql, ['Bulbasaur', 'grass']);

Node.js (example)

Little Bobby Tables

XKCD 327

db.close()

The close function closes the db connection and ensures that its associated resources are deallocated. If we don't close our connections it's possible we could consume all the memory available to our program, which would cause it to crash. This is known as a memory leak.

let sql = 'SELECT name, type FROM pokedex LIMIT 2';
let pokemon = await db.all(sql);
console.log(pokemon);
await db.close();

Node.js (example)

Always close your db connections when you're done with them, such as at the end of your endpoint functions (but before you return the response)

Multi-table SQL queries

Related tables and keys

  • Primary Key: a column guaranteed to be unique for each record
  • Foreign Key: a column from Table B that references a PRIMARY KEY or UNIQUE column in Table A
  • ...
    FOREIGN KEY (dept_id) REFERENCES employees (dept_id)
  • Normalizing: Splitting tables to improve structure/redundancy (linked by unique IDs)
    • More in-depth coverage in database courses: CSE 344 (CSE majors), CSE 414 (non-CSE-majors), and INFO 330.

Using FOREIGN KEYs

The following rules apply to foreign key definitions:

  • A table can have 0 or more foreign keys.
  • Define the FOREIGN KEY(s) when a table is created or altered.
  • If the referenced column is null, the foreign key can be null.

Cafe Example

The following are the CREATE TABLE statements for both tables in the cafe database.

CREATE TABLE menu(
  id INT PRIMARY KEY AUTOINCREMENT,
  name VARCHAR(100) NOT NULL,
  category VARCHAR(100),
  subcategory VARCHAR(100),
  price DECIMAL(6, 2),
  cost DECIMAL(6, 2)
);

CREATE TABLE orders(
  id INT PRIMARY KEY AUTOINCREMENT,
  mid INT NOT NULL,
  phone_number VARCHAR(20),
  email VARCHAR(63),
  item_name VARCHAR(63) NOT NULL,
  qty INT DEFAULT 0,
  total_price DECIMAL(6,2) NOT NULL,
  order_time DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY(??) REFERENCES menu(??)
);

sql

What menu table column would be good to reference as a FOREIGN KEY in the orders table?

Can we remove any redundant columns now? Is there another table you can think of factoring out?

Redesigning the WPL Queue Table

Suppose we wanted to manage TA shifts, and also allow other admin roles in a Queue Interface.

The following is an attempt to add more data to a WPL queue table:

CREATE TABLE old_queue(
   id INT PRIMARY KEY AUTOINCREMENT,
   status VARCHAR(10) DEFAULT "waiting",
   name VARCHAR(255) NOT NULL,           -- name of student
   email VARCHAR(255) NOT NULL,          -- email of student
   student_id INT NOT NULL,              -- 1000000
   length TINYINT NOT NULL,              -- e.g. 2 or 10
   question TEXT,
   assigned_ta VARCHAR(255),             -- username of TA (e.g. em66)
   creation_date DATETIME DEFAULT CURRENT_TIMESTAMP
);

SQL

The WPL Database Breakdown

This version demonstrates using FOREIGN KEYs in different ways.

CREATE TABLE users(
  uwid INT PRIMARY KEY, -- 1500000
  name VARCHAR(255),    -- Andrew
  email VARCHAR(255)    -- awolfram@uw.edu
);

CREATE TABLE staff(
  username VARCHAR(255) PRIMARY KEY, -- awolfram
  uwid INT,                          -- 1500000
  password VARCHAR(255) NOT NULL,    -- dogs>cats!
  section VARCHAR(2),                -- A
  role VARCHAR(10) DEFAULT 'TA',     -- Instructor
  -- keys don't need to be identical, but they are here.
  FOREIGN KEY (uwid) REFERENCES users(uwid)
);

CREATE TABLE queue(
  qid INT PRIMARY KEY AUTOINCREMENT,    -- 4
  status VARCHAR(10) DEFAULT 'waiting', -- "waiting"
  length TINYINT NOT NULL,              -- 2
  student_id INT,                       -- 1234567
  assigned_ta VARCHAR(255),             -- NULL
  question TEXT,                        -- "Halp with SQL plz."
  creation_date DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (student_id)  REFERENCES users(uwid),
  FOREIGN KEY (assigned_ta) REFERENCES staff(username)
);

sql

Today's Database: The WPL Queue

wpldb

Viewing Relationships

3 WPLDB tables with primary and foreign keys

How can we use multiple tables in one SQL query?

When you have relationships defined with FOREIGN KEY/PRIMARY KEY, it is often useful to reference both tables to combine data (e.g. category of a menu item to display order details, requiring both menu and orders tables)

We can reference multiple tables either with an additional WHERE constraint or the JOIN keyword.

Multi-table WHERE Syntax


SELECT col(s)
FROM table1, table2, ...
WHERE table1.a = table2.b
AND table2.c > '42';

SQL (template)


SELECT queue.question, queue.creation_date, users.name
FROM queue, users
WHERE queue.student_id = users.uwid
AND queue.question LIKE "%Halp%";

SQL (example)

Giving Names to Tables (Aliases)


SELECT queue.question, queue.creation_date, users.name
FROM queue, users
WHERE queue.student_id = users.uwid
AND queue.question LIKE "%Halp%";

SQL (example)

A more compact solution (giving variable names to tables)


SELECT q.question, q.creation_date, users.name
FROM queue q, users
WHERE q.student_id = users.uwid
AND q.question LIKE "%Halp%";

SQL (alternative example)

  • You can optionally give names to tables (like a variable) following the table name in FROM.
  • Note: To specify all columns in a table, you can write table.*, but this tends to be poor practice if you don't use all columns

An Equivalent Way to JOIN Tables


SELECT q.question, q.creation_date, users.name
FROM queue q, users
WHERE q.student_id = users.uwid
AND q.question LIKE "%Halp%";

SQL (example from previous slide)


SELECT q.question, q.creation_date, users.name
FROM queue q
JOIN users ON q.student_id = users.uwid
WHERE q.question LIKE "%Halp%";

SQL (alternative example, using the JOIN keyword)

The JOIN keyword is another way to join multiple tables. Some people find this more intuitive, while others find joining tables on multiple WHERE conditions more intuitive. Use whichever form you prefer most.

What if we have more than 2 tables?

You can join multiple tables using an additional equality WHERE condition, or an addition JOIN ON constraint.

SELECT a.value1, b.value2, c.value3
FROM table1 a, table2 b, table3 c
WHERE a.value4 = b.value4
AND a.value5 = c.value5;

SQL join, WHERE-style

SELECT a.value1, b.value2, c.value3
FROM table1 a
JOIN table2 b ON a.value4 = b.value4
JOIN table3 c ON a.value5 = c.value5;

SQL join, JOIN-style

This is usually done to join a FOREIGN KEY from one table with a PRIMARY KEY on another, giving access to the values in the table with the PRIMARY KEY.

Types of SQL joins

Joins

In this class we'll only use inner joins, but let's look at an example where we might might need a different type of join.

From the WPL queue, I want to query the question, TA username, and TA section for all questions in the queue

SELECT q.question, q.username, staff.section 
FROM queue q
JOIN staff ON q.assigned_ta = staff.username;

But what we really want is not an inner join, but a left join

SELECT q.question, q.username, staff.section 
FROM queue q
LEFT JOIN staff ON q.assigned_ta = staff.username;

Back to Foreign Keys

Question: Do I have to define a foreign key to use join?

Foreign Key is a Constraint

The value in the foreign key column must exist as the primary key in the referenced table.

This prevents us from:

  • Inserting a row with an non-existent foreign key
  • Deleting a row with a primary key this is referenced as a foreign key in another table

Viewing Relationships

3 WPLDB tables with primary and foreign keys