The web logs are located at \\rfilesrv2\students\rap\webfiles. If you can't read that directory, please notify me immediately. Inside that directory are a number of sub directories. Each sub directory is named with the login of the people whose files they contain. For example, you'd find my log files at \\rfilesrv2\students\rap\webfiles\rap.
Within each person's directory, you'll find a number of files. The files are stored as monthday. So the file 926 stores all of the accesses to my pages in my web directory for the 26th of September.
Each line of the file represents one access to a web page in that directory. The format of the files is as follows:
The log contains all the information that the more traditional NCSA-style access log contains, plus trailing fields with referrer and user agent information. Here is the section of bauhaus:/www/apacheconf/httpd.conf that determines the contents of the logs: # CustomLog formats: the first two emulate NCSA agent and referrer logs; the # third emulates "extended common log format". Eventually, the first two # should wither, because they are less useful than the third by itself. # # %{User-agent}i the incoming user-agent field # %{Referrer}i the incoming referrer field # %b number of bytes transfered # %f file name # %h remote host # %l remote logname # %u remote user # %r first line of request # %s status # %t time of request Here is an example line from a log, split into one field per line: rem. host 199.181.178.67 rem. logname - rem. user - time [28/Feb/1997:14:23:16 -0800] request "GET /homes/lopez/images-food/berry-sticker.html HTTP/1.0" status 200 bytes 584 referrer "http://www.cs.washington.edu/homes/lopez/images-food.html" user agent "Mozilla/2.0 (Win16; I)" Unlike NCSA, each log receives exactly one line per transfer. Thus, the referrer_log differs from that provided by NCSA in that a line is generated for each transfer, without regard to whether the referrer information is available.This means that when you're reading the file from Java, you'll want to read it a line at a time (probably using java.io.BufferedReader.readLine() where you've built your buffered reader from a FileReader) and then parse each line using Java.util.StringTokenizer.
At the beginning of the quarter I recommended that people who had web pages off of the students server turn on logging. The format of these files should be very similar, but they are described in the URL that I sent you at that point.
Please note that this step must be done individually as designing your database will make what you do next different.
Date Number of accesses ---------------------------- 10/25/1999 300 11/2/1999 500 11/3/1999 400Note that extra cool pages (such as those with graphs, extra useful views on the data, etc) can earn up to 20 points of extra credit.