CSE 326 Data Structures
Project 4
Heaps vs. Calendar Queues

Due: Tuesday, June 2nd, 11:30 PM for the program and Wednesday, June 3rd, in class, for the written report.

Objective: To compare the performance of heaps and calendar queue as the event queue in a discrete event simulator and write a report describing your findings.

Overview

A discrete event simulator needs a priority queue to maintain its event queue. The event queue is used in the main simulation loop: the event with the smallest time is removed from the event queue and new events are inserted into the event queue as a result of the processing of the smallest time event. There are many implementations of priority queues. One of the very best is the elegant binary heap. Another, which is especially good for discrete event simulation, is the calendar queue. The calendar queue has two user controlled parameters, bucket width and number of buckets, that govern its performance. If the parameters are chosen correctly then the calendar queue can have constant average time per event processed. By contrast, the binary heap has logarithmic average time per event. In this project you will write programs implementing the binary heap and calendar queue and explore the differences between the two. In addition, you will explore the process of selecting the parameters of the calendar queue so as to achieve optimal performance. A principal part of the project is a written report with your findings, including your experimental methodology and a number of graphs showing the results of your research. In this project we will only consider discrete event simulations where the number of active events does not vary, that is, the static case.

Assignment

There are three pieces of code that you will have to write: the binary heap, the calendar queue, and the driver code that exercises the two priority queues and measures the time used by each of them. We provide one function, NewEvent, that emulates the processing of an event. The function can also be used to seed the priority queue with initial events. The NewEvent function has three parameters: t, the time of the event being processed, m, the mean of the jump, and w, the amount of work needed to process the event. The function returns a time t' that is t plus a random amount that is exponentially distributed with mean m. The function takes about w times .01 milliseconds to return the value, which emulates the time to process an event in a real simulation. To seed the priority queue with N events you simply call NewEvent N times with t = 0 and w = 0. The parameter m can be chosen to an appropriate fixed positive constant. The NewEvent function is provided for you in the files "newevent.h" and "newevent.C", which can be found in the /cse/courses/cse326/98sp/project4 directory.

The report should describe and compare the performance of the calendar queue and the binary heap. Four graphs that should be included in your report. These graphs should be presented professionally using plotting tools in common spreadsheets.

  1. Time per event vs. bucket width for the calendar queue. Fix the number of events to be 1,000, fix the mean of the jump to be 1,000,000, fix the work to be 0, and fix the number of buckets to be 5,000. Plot the expected time per event for varying sizes of bucket width. Your plot should demonstrate that there is an optimal bucket width.
  2. Time per event vs. number of buckets for the calendar queue. Fix the number of events to be 1,000, fix the mean of the jump to be 1,000,000, fix the work to be 0, and fix the bucket width to be approximately optimal for the number buckets equal to 5,000. Plot the expected time per event for varying number of buckets. Your plot should demonstrate that choosing a small number of buckets degrades performance and that as the number of buckets grows performance eventually levels off.
  3. Compare the binary heap and the calendar queue for varying priority queue size. Fix the mean of the jump to be 1,000,000, fix and the work to be 0. Plot the expected time per event for each of the binary heap and the calendar queue for varying number of events ranging from 100 events to 10,000 events. Care must be taken to choose the bucket width and number of buckets to achieve good performance for the calendar queue for the various numbers of events.
  4. Compare the binary heap and the calendar queue for varying work. Fix the mean of the jump to be 1,000,000 and the number of events to be 10,000. Fix a bucket width and number of bucket for near optimal performance for the calendar queue. Define the speed-up to be the expected time of the calendar queue divided by the expected time for the binary heap. Plot the speed-up for varying amounts of work, starting with work equal to 0.

Experimental Methodology

Care must be taken to make sure that the numbers you generate are valid and reflect the true performance of two priority queues. You should use a function to measure the wall clock time. A good one to use is the standard C library function clock (defined in "time.h") which returns the number of clock ticks that have passed since your program started. There are also some non-standard functions available on our UNIX machines that may be more accurate. You will want to run the simulation for a number of steps before starting to measure the time per event. This is the "warm-up" period. How long should the warm-up period be? You will then want to run the simulation for long enough to get a good measurement of the time per event. How long is long enough? One common test is that if you run the simulation for K steps and 2K steps and there is little change in the time per event, the K steps is probably enough.

You will discover the need to make repeated experiments for each of your data points. If you are running your experiment on a multi-tasking system, then your timing measurement can be affected by other tasks running on the same machine. How many trials should you make in order to get good data? What data should be thrown out? A common experimental approach is to make a small number of trials like 15 and take the median as indicative of the correct value. Taking the average of 15 would include bad data. To minimize interference from other tasks you might consider running your experiments on a single user computer. Even on a single user machine that is also multi-tasking you should be careful to stop any background jobs.

Remember that your project report should contain a description of your experimental methodology.


What to hand in

You are required to hand in your project4 directory with your implementation of the binary heap and calendar queue. You may have implemented a number of different drivers to generate the data for your four plots. We don't need to see them all. You can just include one in your project4 directory. You also need to turn in a written report on Wednesday, June 3rd.

Turn your source code and README files in electronically using turnin:
% cd ~/326stuff/myproj4dir
% make clean
% cd ../
% turnin -c cse326=AA -p project4 myproj4dir


Warning

Some of your experiments can take a long time. You do not want to wait until the last minute to do this project.