CSE 544 Simple Guide to SQL

Introduction

SQL, the Structured Query Language, is a declarative language based on operators of the relational algebra.  It is the industry-standard language for querying relational databases; though the research community often uses Datalog for its elegance and expressiveness, SQL allows much finer control of physical database details.   This guide gives you the basics of querying in SQL; it is by no means comprehensive, but should give you a good feel for writing SQL queries.

The Basics:  Selection and Projection

SQL queries are based on a SELECT fields FROM tables WHERE conditions structure.  Suppose we have the tables:

RegisteredPet

Tag Name Species Breed Age OwnerSSN
123 Charlie dog fox terrier, beagle 5 123-45-6789
124 Boss dog Welsh Corgi 7 124-92-1234
920 Snowball cat Siamese 3 123-45-6789
992 Silver cat Manx 6 928-43-8293

Person

SSN FirstName LastName Address City State
123-45-6789 Adam Smith 23 Columbus Detroit MI
124-92-1234 Michelle Huffman Box 93 Seattle WA
928-43-8293 Chris Jones 4820 35th St. NW #12 Seattle WA

As we think about how we might use SQL to do selection and projection, let's first observe that SQL's SELECT statement is a misnomer by relational algebra terms:  the argument that comes after the SELECT is the fields that should be returned, i.e. we are really making a projection here!  The conditions after the WHERE allow us to specify which tuples to return, so the conditions are what allow us to do the selection.  If that's sufficiently confusing, maybe a few examples are in order:

To project the names of all of the RegisteredPets:

SELECT Name
FROM RegisteredPet
 
Name
Charlie
Boss
Snowball
Silver

To select all tuples from RegisteredPet which correspond to dogs:

SELECT *
FROM RegisteredPet
WHERE Species = 'dog'
Tag Name Species Breed Age OwnerSSN
123 Charlie dog fox terrier, beagle 5 123-45-6789
124 Boss dog Welsh Corgi 7 124-92-1234

Note that * means "all attributes in the result," so SELECT * does a projection of all fields.  We can do selections based on equality, as above, or we can make use of inequalities such as <, <=, >=, <>.  SQL also supports range queries with the BETWEEN operator (WHERE x BETWEEN 1 AND 50) and set membership with the IN operator (WHERE x IN ('a', 'b', 'c')).

With strings, we can also use wildcards with the LIKE comparison; the % character is equivalent to a * character in Unix or DOS:

SELECT *
FROM RegisteredPet
WHERE Name LIKE 'S%'
Tag Name Species Breed Age OwnerSSN
920 Snowball cat Siamese 3 123-45-6789
992 Silver cat Manx 6 928-43-8293

(Note that Microsoft Access uses a nonstandard SQL syntax where the * character is actually used, rather than %, in LIKE queries.)

Now let's return the names and ages of all cats:

SELECT Name, Age
FROM RegisteredPet
WHERE Species = 'cat'
 
Name Age
Snowball 3
Silver 6

Joins

To perform a join in SQL, we SELECT from multiple tables and use the WHERE condition to specify the condition which must hold for two tuples to be joined.  Let's join the pets and their owners:

SELECT *
FROM People, RegisteredPets
WHERE SSN = OwnerSSN
SSN FirstName LastName Address City State Tag Name Species Breed Age
123-45-6789 Adam Smith 23 Columbus Detroit MI 123 Charlie dog fox terrier, beagle 5
123-45-6789 Adam Smith 23 Columbus Detroit MI 920 Snowball cat Siamese 3
124-92-1234 Michelle Huffman Box 93 Seattle WA 124 Boss dog Welsh Corgi 7
928-43-8293 Chris Jones 4820 35th St. NW #12 Seattle WA 992 Silver cat Manx 6

Note that the join above has 11 attributes, rather than 12.  This is because we did a join on an equality condition (also known as an equijoin), and in this case we remove the duplicate SSN attribute.  SQL also supports non-equijoins, i.e. joins based on inequalities.

Another important observation to make is that in this case, there were no "name clashes" between attributes in the two tables.  This is not always the case when joining arbitrary tables.  In SQL, you can use tuple variables to specify a scope on a particular attribute name.

SELECT p.LastName
FROM People p, RegisteredPets r
WHERE p.SSN = r.OwnerSSN
AND r.Age < 5

Lists vs. Sets/Duplicate Removal

The default semantics of SQL are list or bag/multiset semantics rather than set semantics, i.e. duplicate tuples are allowed.  This is because bag semantics are much more efficient to execute in practice.  The DISTINCT keyword allows us to remove duplicates in our queries:

SELECT DISTINCT Species
FROM RegisteredPet
 
Species
dog
cat

Aggregation

SQL also supports a number of aggregation operators, which take tuples as input and produce a single result.  The operators are count, max, min, avg, sum.

To count the number of RegisteredPets, we would do:

SELECT COUNT(*)
FROM RegisteredPet
4

To find out the age of the oldest pet, we would execute the query:

SELECT MAX(Age)
FROM RegisteredPet
7

Sorting and Grouping

The ORDER BY clause allows us to specify a sort order for the results:

SELECT Name
FROM RegisteredPet
ORDER BY Name
 
Name
Boss
Charlie
Silver
Snowball

When querying with aggregation, we often want to be able to use the aggregate operator not across the entire data set, but across some group of tuples.  For example, we might want to know how many pets of each type exist:

SELECT Species, Count(Name)
FROM RegisteredPet
GROUP BY Species
Species Expr1001
dog 2
cat 2

The HAVING clause is used in combination with GROUP BY as a way of filtering some groups from the list:

SELECT Species, Count(Name)
FROM RegisteredPet
GROUP BY Species
HAVING Age > 6
Species Expr1001
dog 2

Using Subqueries

A significant number of complex queries, namely those requiring comparison of a single tuple against multiple tuples or an aggregate value, will require using subqueries.   Note that the only legal place to put a subquery in SQL is in the WHERE clause; you cannot put it in the FROM clause.

For example, suppose we would like to know the names of people who own more than one pet.  We can get the owners of multiple pets using a GROUP BY ... HAVING ... clause; then we use the IN operator to see if the current tuple belongs in the returned set:

SELECT FirstName, LastName
FROM Person
WHERE SSN IN
(SELECT OwnerSSN
FROM RegisteredPet
GROUP BY OwnerSSN
HAVING Count(OwnerSSN) > 1
)
FirstName LastName
Adam Smith

We can easily query for the oldest animal using nested queries and a comparison the ALL operator:

SELECT Name
FROM RegisteredPet
WHERE Age >= ALL
(SELECT Age
FROM RegisteredPet
)

or, equivalently:

SELECT Name
FROM RegisteredPet
WHERE Age >=
(SELECT MAX(Age)
FROM RegisteredPet
)
Name
Boss

As a final  example, we'll look at all owners of pets who are older than average, plus the names of those pets:

SELECT FirstName, LastName, r1.Name, r1.Age
FROM Person, RegisteredPet r1
WHERE r1.Age >
(SELECT Avg(r2.Age)
FROM RegisteredPet r2) AND SSN = r1.OwnerSSN
FirstName LastName Name Age
Michelle Huffman Boss 7
Chris Jones Silver 6

And Now... Hands-On Experience!

All right, you've now run through the quick tutorial, and should have some idea of how to ask queries.  Now it's time to try things out for yourself and gain some real experience.  Find a machine with a database program (Access, SQL Server, Informix, Oracle, etc.), find a sample database, and start running some SQL queries on it.  You may even want to enter the tables from this guide, and try taking the sample queries we've given you and altering them to answer different questions.

Have fun!