Project 3 - Virtual Memory Trace Analysis

Outline

Out: Wednesday, February 11, 2004
Experiment Description Due: Wednesday, February 18, 2004
Due: Wednesday, February 25, 2004

For this project, you'll be working in the same groups as the last project. If you feel you need to change groups, please let us know as soon as possible.

Tasks:

  1. Implement an LRU-like page replacement algorithm. (additional replacement algorithms could be useful for the next task)
  2. Design an experiment to obtain insight into some aspect of the virtual memory system.
  3. Describe your results in a report.

Assignment Goals

Background

We have studied a wide range of page replacement algorithms in class, and discussed their relative advantages and disadvantages in a rather abstract setting. However, we have not discussed whether they work as expected in practice, on real applications.

We have similarly discussed other parameters of VM systems - page size, number of physical pages, the application's locality - but, again, have not presented evidence that our theoretical analysis holds in practice.

This assignment seeks to explore, using real data collected by Dennis Lee, a graduate of the department. The data was collected using Etch, a tool for instrumenting Windows NT applications.

Etch produces trace files that, for our purposes, list every virtual address referenced (be it an instruction fetch, a load, or a store) that the program made during execution. Note that these files (like the applications) do not have any information about the underlying pages. These files are stored in .et format, but the parsing of that format is taken care of for you.

Vmtrace

The vmtrace application is the skeleton code for this assignment. In this assignment, that code does almost all of the work; we want you to have time to conduct an experiment, and also figure you got enough debugging in the last assignment.

This project, since it does not involve modifying the kernel, does not require VMWare. The vmtrace package should work on pretty much any UNIX machine (which should include any recent version of Linux, Mac OS X, *BSD, Solaris, or even Linux running under VMware). It should even work on Windows using the Cygwin package (make sure you install the zlib package if you are using Cygwin). Because trace analysis is very CPU intensive, I'd like to encourage you to use your own machine, if possible. As always, please do not use attu.

If you are using a shared machine, please nice(1) your vmtrace process. Ex: nice ./vmtrace [vmtrace arguments]

Vmtrace is available on spinlock/coredump in /cse451/projects/vmtrace-1.X.tar.gz. (Where X is the release number number, which may be updated. Use the latest version.) For your convenience, the latest version is also available via http.

Like simplethreads, vmtrace contains a lot of files, but most are safe to ignore. Pay attention to:
File Contents
vmtrace.c The main() routine; very simple.
vmtrace.h Defines common datatypes (e.g. vaddr_t).
simulate.{c,h} The main loop; gets the next reference, determines if it is a fault, and updates the modified/reference bits.
fault.{c,h} The fault handlers; this is where you'll be adding most of your code.
pagetable.{c,h} Implements a pagetable. Also contains the definition of the pte_t struct.
physmem.{c,h} Models physical memory, which your replacement algorithm needs to manage.
stats.{c,h} Collect and output statistics. Note the increment-accessors are in stats.h as inline functions.
util.{c,h} Utility routines to access bit fields and compute logarithms/exponents (base 2). util.h also contains vaddr_to_vfn, which converts a virtual address to a virtual frame number.
options.{c,h} Parses command line options; if you add configuration parameters to your algorithm, you can parse them here.
input.{c,h} Parses the tracefile and returns the next reference. You probably won't need to modify or use these files.
Makefile.am This file lists the source files (both .c and .h) for the project. See below for instructions on adding new files.
The build procedure should seem familiar: it is identical to that for simplethreads. Vmtrace should compile without any warnings.

In summary, the steps are:
  1. cd /cse451/LOGIN
  2. tar -xvzf /cse451/projects/vmtrace-1.X.tar.gz
  3. cd vmtrace-1.X
  4. ./configure
  5. make

Run ./vmtrace -h to see the help/usage information. Note that you do not need to gunzip the tracefiles before using them; vmtrace will decompress them on the fly (assuming the zlib library is available on your system; the -h output will confirm this).

vmtrace has several options intended to make simulation easier. It can append the statistics to a given file (-o FILE) rather than printing them to stdout. The results are reported in comma-separated-value format (CSV) for ease of analysis. I recommend using the -o option to save your stats in combination with the -v option, which will output progress information.

To Add a Source File

If you add a new .c file, do the following:
  1. Edit the Makefile.am file, adding the new .c file to the vmtrace_SOURCES list.
  2. From the top-level directory (vmtrace-1.X), run automake.
  3. Also from the top-level directory, run ./configure.
  4. Your file is now added; run make as usual to build it.
If you add a new .h file, follow the same steps, but add the file to the noinst_HEADERS list. This will ensure that your file is included in your submission.

The Assignment

Part 1: Implement Page Replacement Algorithms
Vmtrace, as shipped, contains only a single page-replacement algorithm (random). For part 1, it is your task to add an LRU-like algorithm. The algorithm should find a space in physical memory for the given pte. This may mean evicting (physmem_evict) a page which is already occupying that space (note that nothing bad will happen if you call evict on a PFN that is not occupied). It should then call physmem_load to insert the pte.

To make your algorithms available, add them to the fault_handlers array in fault.c. See the random algorithm for an example.


Part 2: Design and Run an Experiment

Design an experiment using the scientific method to examine some aspect of virtual memory. There are many parameters available in the simulation - replacement algorithm, number of pages, page size, and parameters of the algorithm - that you may chose to vary; note that a good experiment will probably focus on one parameter.

Please email tanderl@cs a quick description of your experiment by Wednesday, February 18, so I can make sure it is reasonable.

The simulation currently reports the following statistics for each type of reference (instruction fetch, load, and store):

Statistic Description
references Total number of memory accesses.
miss Number of page faults.
compulsory Number of compulsory faults (first time a page was accessed).
evictions Number of times a page was removed from physical memory.
pageouts Number of times an eviction required a write to disk.


In addition, the statistics output includes the number of physical pages used, the page size (in bytes), the input file name, the replacement algorithm, and the simulation limit on number of references (or 0 if unlimited). This is intended to make it easier to track multiple experiments; using the same output file (-o), you can append successive trials to a single stats file. Note that the type statistics (ifetch/load/store) relate to the cause of the eviction or pageout, not the type of page that was evicted. You may find it useful to add more statistics to the simulation.

Trace File

The trace file is available on both spinlock and coredump as /cse451/projects/netscape.exe.et.gz. If you are using spinlock/coredump to run your simulation, there is no need to copy the file; use it directly out of /cse451/projects.

A full trace simulation can take hours, so make sure to leave plenty of time for actually conducting the experiment. The nohup(1) command may be useful (normally, if you logout, your simulation would end; nohup in combination with background (&) will allow you to run your command and come back for the results later).


Part 3: Analysis and Report
Include a presentation of your experimental design, data, analysis, and conclusions in your report. You should consider how to most effectively present your data (graphs, charts, tables, and/or discussion). While your conclusions should contain some discussion of what you believe the experimental results mean, you should be careful to distinguish between what your experiment has actually proven and what you are speculating on.


Turnin
In your top-level vmtrace directory, run make dist. This will produce a file named vmtrace-1.X.tar.gz. Submit this file using the turnin(1L) program under project name project3 before class on the day it is due. turnin will not work on coredump/spinlock, so you'll need to use attu for this step.

If you have added any files, run tar -tzf vmtrace-1.X.tar.gz and check to make sure your new files are listed.

Print your report and bring it to lecture on Wednesday, February 25.