SQL, the Structured Query Language, is a declarative language based on operators of the relational algebra. It is the industry-standard language for querying relational databases; though the research community often uses Datalog for its elegance and expressiveness, SQL allows much finer control of physical database details. This guide gives you the basics of querying in SQL; it is by no means comprehensive, but should give you a good feel for writing SQL queries.
SQL queries are based on a SELECT
fields FROM
tables
WHERE
conditions structure. Suppose we have the tables:
RegisteredPet
Tag | Name | Species | Breed | Age | OwnerSSN |
123 | Charlie | dog | fox terrier, beagle | 5 | 123-45-6789 |
124 | Boss | dog | Welsh Corgi | 7 | 124-92-1234 |
920 | Snowball | cat | Siamese | 3 | 123-45-6789 |
992 | Silver | cat | Manx | 6 | 928-43-8293 |
Person
SSN | FirstName | LastName | Address | City | State |
123-45-6789 | Adam | Smith | 23 Columbus | Detroit | MI |
124-92-1234 | Michelle | Huffman | Box 93 | Seattle | WA |
928-43-8293 | Chris | Jones | 4820 35th St. NW #12 | Seattle | WA |
As we think about how we might use SQL to do selection and projection, let's first
observe that SQL's SELECT
statement is a misnomer by relational algebra
terms: the argument that comes after the SELECT
is the fields
that should be returned, i.e. we are really making a projection here! The conditions
after the WHERE
allow us to specify which tuples to return, so the conditions
are what allow us to do the selection. If that's sufficiently confusing,
maybe a few examples are in order:
To project the names of all of the RegisteredPets:
- SELECT Name
- FROM RegisteredPet
-
Name |
Charlie |
Boss |
Snowball |
Silver |
To select all tuples from RegisteredPet which correspond to dogs:
- SELECT *
- FROM RegisteredPet
- WHERE Species = 'dog'
Tag | Name | Species | Breed | Age | OwnerSSN |
123 | Charlie | dog | fox terrier, beagle | 5 | 123-45-6789 |
124 | Boss | dog | Welsh Corgi | 7 | 124-92-1234 |
Note that *
means "all attributes in the result," so SELECT
*
does a projection of all fields. We can do selections based on equality, as
above, or we can make use of inequalities such as <
, <=
, >=
,
<>
. SQL also supports range queries with the BETWEEN
operator (WHERE x BETWEEN 1 AND 50
) and set membership with the IN
operator (WHERE x IN ('a', 'b', 'c')
).
With strings, we can also use wildcards with the LIKE
comparison; the %
character is equivalent to a *
character in Unix or DOS:
- SELECT *
FROM RegisteredPet
WHERE Name LIKE 'S%'
Tag | Name | Species | Breed | Age | OwnerSSN |
920 | Snowball | cat | Siamese | 3 | 123-45-6789 |
992 | Silver | cat | Manx | 6 | 928-43-8293 |
(Note that Microsoft Access uses a nonstandard SQL syntax where the *
character is actually used, rather than %
, in LIKE
queries.)
Now let's return the names and ages of all cats:
- SELECT Name, Age
FROM RegisteredPet
WHERE Species = 'cat'
-
Name | Age |
Snowball | 3 |
Silver | 6 |
To perform a join in SQL, we SELECT
from multiple tables and use the WHERE
condition to specify the condition which must hold for two tuples to be joined.
Let's join the pets and their owners:
- SELECT *
FROM People, RegisteredPets
WHERE SSN = OwnerSSN
SSN | FirstName | LastName | Address | City | State | Tag | Name | Species | Breed | Age |
123-45-6789 | Adam | Smith | 23 Columbus | Detroit | MI | 123 | Charlie | dog | fox terrier, beagle | 5 |
123-45-6789 | Adam | Smith | 23 Columbus | Detroit | MI | 920 | Snowball | cat | Siamese | 3 |
124-92-1234 | Michelle | Huffman | Box 93 | Seattle | WA | 124 | Boss | dog | Welsh Corgi | 7 |
928-43-8293 | Chris | Jones | 4820 35th St. NW #12 | Seattle | WA | 992 | Silver | cat | Manx | 6 |
Note that the join above has 11 attributes, rather than 12. This is because we did a join on an equality condition (also known as an equijoin), and in this case we remove the duplicate SSN attribute. SQL also supports non-equijoins, i.e. joins based on inequalities.
Another important observation to make is that in this case, there were no "name clashes" between attributes in the two tables. This is not always the case when joining arbitrary tables. In SQL, you can use tuple variables to specify a scope on a particular attribute name.
- SELECT p.LastName
FROM People p, RegisteredPets r
WHERE p.SSN = r.OwnerSSN
AND r.Age < 5
The default semantics of SQL are list or bag/multiset semantics rather than set
semantics, i.e. duplicate tuples are allowed. This is because bag semantics are much
more efficient to execute in practice. The DISTINCT
keyword allows us
to remove duplicates in our queries:
- SELECT DISTINCT Species
FROM RegisteredPet
-
Species |
dog |
cat |
SQL also supports a number of aggregation operators, which take tuples as input and produce a single result. The operators are count, max, min, avg, sum.
To count the number of RegisteredPets, we would do:
SELECT COUNT(*)
FROM RegisteredPet
4 |
To find out the age of the oldest pet, we would execute the query:
SELECT MAX(Age)
FROM RegisteredPet
7 |
The ORDER BY
clause allows us to specify a sort order for the results:
- SELECT Name
- FROM RegisteredPet
ORDER BY Name
-
Name |
Boss |
Charlie |
Silver |
Snowball |
When querying with aggregation, we often want to be able to use the aggregate operator not across the entire data set, but across some group of tuples. For example, we might want to know how many pets of each type exist:
- SELECT Species, Count(Name)
- FROM RegisteredPet
GROUP BY Species
Species | Expr1001 |
dog | 2 |
cat | 2 |
The HAVING
clause is used in combination with GROUP BY
as a
way of filtering some groups from the list:
- SELECT Species, Count(Name)
- FROM RegisteredPet
GROUP BY Species
HAVING Age > 6
Species | Expr1001 |
dog | 2 |
A significant number of complex queries, namely those requiring comparison of a single tuple against multiple tuples or an aggregate value, will require using subqueries. Note that the only legal place to put a subquery in SQL is in the WHERE clause; you cannot put it in the FROM clause.
For example, suppose we would like to know the names of people who own more than one
pet. We can get the owners of multiple pets using a GROUP BY
... HAVING
... clause; then we use the IN
operator to see if the current tuple belongs
in the returned set:
- SELECT FirstName, LastName
FROM Person
WHERE SSN IN
(SELECT OwnerSSN
FROM RegisteredPet
GROUP BY OwnerSSN
HAVING Count(OwnerSSN) > 1
)
FirstName | LastName |
Adam | Smith |
We can easily query for the oldest animal using nested queries and a comparison the ALL
operator:
SELECT Name
FROM RegisteredPet
WHERE Age >= ALL
(SELECT Age
FROM RegisteredPet
)
or, equivalently:
SELECT Name
FROM RegisteredPet
WHERE Age >=
(SELECT MAX(Age)
FROM RegisteredPet
)
Name |
Boss |
As a final example, we'll look at all owners of pets who are older than average, plus the names of those pets:
- SELECT FirstName, LastName, r1.Name, r1.Age
FROM Person, RegisteredPet r1
WHERE r1.Age >
(SELECT Avg(r2.Age)
FROM RegisteredPet r2) AND SSN = r1.OwnerSSN
FirstName | LastName | Name | Age |
Michelle | Huffman | Boss | 7 |
Chris | Jones | Silver | 6 |
All right, you've now run through the quick tutorial, and should have some idea of how to ask queries. Now it's time to try things out for yourself and gain some real experience. Find a machine with a database program (Access, SQL Server, Informix, Oracle, etc.), find a sample database, and start running some SQL queries on it. You may even want to enter the tables from this guide, and try taking the sample queries we've given you and altering them to answer different questions.
Have fun!