Amazon.326

CSE 326

Summer 2000

Due in class: Friday, July 28

Last revised: 7/18/00

Submission Guidelines

When a customer looks up a book on Amazon.com, its "sales rank" is displayed.  This is its rank, based on the number of copies sold, compared with all other books that have ever been sold by the company. The best-selling book has rank #1. [Guess what book now has rank #1!]  After you've ordered a book from Amazon, you'll notice that its sales rank increases; sometimes the increase is dramatic.

Write a program that maintains sales rank information.  The input will be a sequence of book sales transactions.  Most transactions will be for single copies, but the program should be able to handle multiple copy orders as well (perhaps not as efficiently).  All books ordered must be remembered (in principle, there could be thousands and thousands of different titles, but for this assignment assume that all the titles will fit into memory).  Each book must be quickly retrievable and quickly updatable, and must have its accurate rank (and the absolute number sold) available efficiently at all times.  In addition, it should be possible to get an efficient list of the top 10 books at any time (titles, ranks and number of copies sold).

The title of a book is its key.  Titles are case-sensitive.

Two (or more) books which have sold the same number of copies should all have the same rank number, and the next lower rank number(s) should not be assigned. Example:

Title # sold rank
Gone with the Wind 1000001 1
Harry Potter and the Enchanted Heap 1000001 1
Data Structures for Smarties 70050 3
Lord of the Ring vol. 1 9378 4
Lord of the Ring vol. 2 9378 4
How to be Very Rich 8009 6
... ... ...
The Joys of Being Poor But Honest 2 50921

 At the lower end of the popularity scale, there could be many, many books which have only sold one copy, and which all should have the same rank.

Non-Requirements

You don't have to be able to give a list of the books.  It is not necessary to be able to find out which book has a particular rank (beyond the top 10).  You don't have to be able to process returns or cancelled orders.

Input and output

The data files represent book orders and information requests, one transaction per line.  A transaction line has a single-character transaction code; one or more spaces; then information dependant on the transaction.  The transaction code is 'i' for "information requestion (about a single book title)"; or 's' for "sale order."  For 'i', the dependent information is the book title.  For 's' the information is an integer (number of copies purchases) followed by the book title.  Example:

s  1  Gone with the Wind

s  2  Harry Potter and the Enchanted Heap

s 1 Harry Potter and the Enchanted Heap

i Gone with the Wind

s Adventures of Superman v.54

i Around the World in 80 Days

An 'i' request should result in an immediate display of information about the book: title, number of copies sold, and sales rank.

After processing each 100 books, print a summary showing the top 10 books (titles, ranks, # of copies), in order by rank, from #1 to #10; and some summary for the whole file so far: # of books, total and average # of copies sold, and average rank.  Also show the amount of CPU time elapsed so far.  Give the same sort of summary after completing processing of a book order file.  The summary file should be written to disk as well as to the monitor.  It should be possible to run "silently" as well, with all results written to disk and none to the monitor.

Your program should be able to process more than one file of books.  To make this easy, make the program's primary input be a file of filenames, each filename representing a book transaction file.  You can simulate large amounts of data by repeating the same filename more than once.  The program should not blow up if any file name is invalid, but should give an error message and continue with the next file.

Each project should contribute at least one nice book file for the rest of the class to use.

Documentation

Apart from the usual good comments in the program, turn in a report that describes 

A. the major data structures used in the project.  Explain how the "efficient" requirements mentioned above are satisfied by your choice of data structures.

B. A summary of experimental results, for one or more runs of the program with large data input.

C. A discussion of known limitations for your program: limits on number of books it can handle; cases where it blows up or gets the wrong answer, etc.

Project Structure

You are permitted, in fact, encouraged, to work in teams of two people.  Your partner should be a different person from your collaborator for the first homework.  Turn in one copy of all material, with both parties' names on all portions.

Later we may say more about the material aspects of turning in the project.  For now, assume that the electronic turn-in is done via Unix, and that a hard-copy of all program code (as well as the project report) is required.