CSE P 544 Homework 1

Objectives:

To understand and be able to write simple SQL queries.

Related Reading:

6.1 - 6.4

Number of points:

100

Due date:

Monday, April 12th, 2004 5:45pm

Tools for the assignment:

For this assignment, you can use your SQL Server account on IISQLSRV.cs.washington.edu.  Login using SQL Server authentication, and your UW CSE login name.  Your password is your UW student id.  Once you are logged into SQL Server, use Query -> Change Database to view the databases on the server. The database to be used for this assignment is called SPAM - please do not alter it in any way, you may only execute queries. Click here for additional instructions, including how to change your SQL Server password (which you should do promptly).

Turnin:

Prepare a file named queries.txt containing your sql queries along with the output they produced when you ran them. Also include in this file any explanation as to why you may have done the query in a peculiar manner. Make sure your name appears at the top of the file. Once you are ready click here to turn in the file. The turn-in server will be closed 15 minutes before the beginning of lecture, Monday April 12th. Alternatively, you can hand in a printout at the beginning of class if you prefer.

Database Description:

The database you will be working with holds information about many thousands of actual spam emails. The database has nine tables. The following are the tables you should use:

dictionaries(id, word)
domains(name, country)
languages(id, name)
statCountries(domainName, countryName, msgCount)
unicodeBody(id, body, uniqueid, domain, langid)
unicodeHeaderV2(id, fromName, fromAddr, ..., subject, domainName)

Homework Problems:

As you do these problems, please remember to save your SQL queries somewhere as you definitely be handing the queries themselves in. If we do not specify what to call a particular column, you may choose whatever label you which. If you have questions or concerns about what we are asking you to do, please email us or come to our office hours.

  1. [6 points] List how many spams were sent from the Cayman Islands.
  2. [6 points] List how many spams were sent from countries whose names begin with 'D'.
  3. [6 points] List the top 10 domains (and the corresponding countries) from which spams were sent. List the domains/countries in decreasing order of number of spams sent.
  4. [6 points] List the top 10 spammers in this database. List them in decreasing order of the number of spams they sent. The table returned should have two columns: one named FromAddr and the other named NumSpams.
  5. [11 points] List the set of all countries from which spam was sent with both the words 'free' and 'game' in the subject.
  6. [11 points] List the number of spams sent in each (non-null) language. The table you return should have two columns: one named Language and the other named NumSpams.
  7. [11 points] List the number of spams sent in each (non-null) language by the spammer with the from address of 'Mail'.
  8. [16 points] List the set of all domains from which the database has no spams. You may not use the statCountries table to answer this question.
  9. [16 points] List the set of spammers (their from address only) that sent spams in both English and French.
  10. [11 points] Write your own unique query. Make it at least as interesting/difficult as problems 5, 6, and 7.

Sample Query Output

Click here