CSE 444 Homework 5

Objectives:: To be able to program database applications using Java. The application is to provide users the ability to understand the log files for web pages by providing different views over the data.
Number of points:: 100
Due date:: November 22
Tools for the project: You will need your SQL Server space and Visual J++ and JDBC.; Files:
The database will consist of the log files for web pages (all of the pages that are in that directory) of people in the class and the professors of the department. The web logs are located at \\rfilesrv2\students\rap\webfiles. If you can't read that directory, please notify me immediately. Inside that directory are a number of sub directories. Each sub directory is named with the login of the people whose files they contain. For example, you'd find my log files at \\rfilesrv2\students\rap\webfiles\rap. Within each person's directory, you'll find a number of files. The files are stored as monthday. So the file 926 stores all of the accesses to my pages in my web directory for the 26th of September. Each line of the file represents one access to a web page in that directory. The format of the files is as follows: The log contains all the information that the more traditional NCSA-style access log contains, plus trailing fields with referrer and user agent information. Here is the section of bauhaus:/www/apacheconf/httpd.conf that determines the contents of the logs: # CustomLog formats: the first two emulate NCSA agent and referrer logs; the # third emulates "extended common log format". Eventually, the first two # should wither, because they are less useful than the third by itself. # # %{User-agent}i the incoming user-agent field # %{Referrer}i the incoming referrer field # %b number of bytes transfered # %f file name # %h remote host # %l remote logname # %u remote user # %r first line of request # %s status # %t time of request Here is an example line from a log, split into one field per line: rem. host 199.181.178.67 rem. logname - rem. user - time [28/Feb/1997:14:23:16 -0800] request "GET /homes/lopez/images-food/berry-sticker.html HTTP/1.0" status 200 bytes 584 referrer "http://www.cs.washington.edu/homes/lopez/images-food.html" user agent "Mozilla/2.0 (Win16; I)" Unlike NCSA, each log receives exactly one line per transfer. Thus, the referrer_log differs from that provided by NCSA in that a line is generated for each transfer, without regard to whether the referrer information is available. This means that when you're reading the file from Java, you'll want to read it a line at a time (probably using java.io.BufferedReader.readLine() where you've built your buffered reader from a FileReader) and then parse each line using Java.util.StringTokenizer. At the beginning of the quarter I recommended that people who had web pages off of the students server turn on logging. The format of these files should be very similar, but they are described in the URL that I sent you at that point.

Homework specifications

This homework is to be done individually within your project space. Each person is responsible for looking at the log files of 3 individuals. You are responsible for reading in the data and then viewing the data. Since the data is being added to on an almost daily basis, each time your application is run, you should prompt it to get any new data that has arrived.

Reading in the data

Your data reading routine should check either the modification dates of the files to discover if new files have been added since the last time, or check the numbers on the file and discover that way. You may assume, without loss of generality, that the numbers on the file are always increasing (so after January we'll prepend something to let the database know that it's a new year). Again, the people working off of the students server have a different format (I believe that they are all added into the same file), so you'll probably just want to check the date of the access versus the last time your data was input.

Please note that this step must be done individually as designing your database will make what you do next different.

Presenting the data

Each assignment should allow the user to see the following views on the data:

Number of accesses by date
Number of accesses by remote host
Number of accesses by referrer field (the URL that they followed the link of to get to this page)
Number of accesses per subdirectory (or just number of accesses per file)
Some other cool view on the data

Each view should be selectable dynamically. So you'll want buttons or a menu of some sort to allow you to switch between the different views. As an example of what we mean, your results for the number of accesses by date would show you how many accesses occurred on each date. So your results could be a simple table in a text box like:

Date	         Number of accesses
----------------------------
10/25/1999     300
11/2/1999      500
11/3/1999      400

Note that extra cool pages (such as those with graphs, extra useful views on the data, etc) can earn up to 20 points of extra credit.

What to turn in:

E-mail Rachel code
Hand in hard copy of:
- screen shots
- a README telling what extra cool features are in your assignment, and assignment rating, and any special cases your assignment does or does not handle.
- for each user, number of accesses by date

Rating the assignment

How long did it take you to complete this assignment?
What did you like the best about this assignment?
What did you like the least about this assignment?
What helped you learn the best in this assignment?