CSE 444 Homework 5

Objectives:
To be able to program database applications using Java. The application is to provide users the ability to understand the log files for web pages by providing different views over the data.
Number of points:
100
Due date:
November 22
Tools for the project
You will need your SQL Server space and Visual J++ and JDBC.
Files:
The database will consist of the log files for web pages (all of the pages that are in that directory) of people in the class and the professors of the department.

The web logs are located at \\rfilesrv2\students\rap\webfiles. If you can't read that directory, please notify me immediately. Inside that directory are a number of sub directories. Each sub directory is named with the login of the people whose files they contain. For example, you'd find my log files at \\rfilesrv2\students\rap\webfiles\rap.

Within each person's directory, you'll find a number of files. The files are stored as monthday. So the file 926 stores all of the accesses to my pages in my web directory for the 26th of September.

Each line of the file represents one access to a web page in that directory. The format of the files is as follows:

The log contains all the information that the more traditional
NCSA-style access log contains, plus trailing fields with referrer and
user agent information.

Here is the section of bauhaus:/www/apacheconf/httpd.conf that determines
the contents of the logs:

  # CustomLog formats: the first two emulate NCSA agent and referrer logs; the
  # third emulates "extended common log format".  Eventually, the first two 
  # should wither, because they are less useful than the third by itself.
  #
  # %{User-agent}i      the incoming user-agent field
  # %{Referrer}i         the incoming referrer field
  # %b                  number of bytes transfered
  # %f                  file name
  # %h                  remote host
  # %l                  remote logname
  # %u                  remote user
  # %r                  first line of request
  # %s                  status
  # %t                  time of request

Here is an example line from a log, split into one field per line:

  rem. host     199.181.178.67 
  rem. logname  - 
  rem. user     - 
  time          [28/Feb/1997:14:23:16 -0800] 
  request       "GET /homes/lopez/images-food/berry-sticker.html HTTP/1.0" 
  status        200 
  bytes         584 
  referrer       "http://www.cs.washington.edu/homes/lopez/images-food.html" 
  user agent    "Mozilla/2.0 (Win16; I)"

Unlike NCSA, each log receives exactly one line per transfer.  Thus,
the referrer_log differs from that provided by NCSA in that a line is
generated for each transfer, without regard to whether the referrer
information is available.

This means that when you're reading the file from Java, you'll want to read it a line at a time (probably using java.io.BufferedReader.readLine() where you've built your buffered reader from a FileReader) and then parse each line using Java.util.StringTokenizer.

At the beginning of the quarter I recommended that people who had web pages off of the students server turn on logging. The format of these files should be very similar, but they are described in the URL that I sent you at that point.

Homework specifications

This homework is to be done individually within your project space. Each person is responsible for looking at the log files of 3 individuals. You are responsible for reading in the data and then viewing the data. Since the data is being added to on an almost daily basis, each time your application is run, you should prompt it to get any new data that has arrived.

Reading in the data

Your data reading routine should check either the modification dates of the files to discover if new files have been added since the last time, or check the numbers on the file and discover that way. You may assume, without loss of generality, that the numbers on the file are always increasing (so after January we'll prepend something to let the database know that it's a new year). Again, the people working off of the students server have a different format (I believe that they are all added into the same file), so you'll probably just want to check the date of the access versus the last time your data was input.

Please note that this step must be done individually as designing your database will make what you do next different.

Presenting the data

Each assignment should allow the user to see the following views on the data: Each view should be selectable dynamically. So you'll want buttons or a menu of some sort to allow you to switch between the different views. As an example of what we mean, your results for the number of accesses by date would show you how many accesses occurred on each date. So your results could be a simple table in a text box like:
Date	         Number of accesses
----------------------------
10/25/1999     300
11/2/1999      500
11/3/1999      400
Note that extra cool pages (such as those with graphs, extra useful views on the data, etc) can earn up to 20 points of extra credit.

What to turn in:

Rating the assignment

  1. How long did it take you to complete this assignment?
  2. What did you like the best about this assignment?
  3. What did you like the least about this assignment?
  4. What helped you learn the best in this assignment?