CSE 451 Winter 2003: Project 4

Project 4 - Virtual Memory Trace Analysis

Administrivia: Find one partner to work with for this project who you have not worked with before and email Valentin by Wednesday, May 14, who the two partners are.
Out: Monday, May 12, 2003
Due: Friday, May 23, 11:59 PM
Writeup due:Wednesday, May 28, at the beginning of lecture

Tasks:

Implement a set of page replacement algorithms.
Design an experiment to obtain insight into some aspect of the virtual memory system.
Describe your results in a report.

Assignment Goals

To understand page replacement algorithms.
To understand the memory behavior of a real application.
To practice experimental design and analysis in computer systems.

Background

We have studied a range of page replacement algorithms in class, and discussed their relative advantages and disadvantages in a rather abstract setting. However, we have not discussed whether they work as expected in practice, on real applications.

We have similarly discussed other parameters of VM systems - page size, number of physical pages, the application's locality - but, again, have not presented evidence that our theoretical analysis holds in practice.

This assignment seeks to explore these replacement algorithms and VM parameters, using real data collected by Dennis Lee, a graduate of the department. The data was collected using Etch, a tool for instrumenting Windows NT applications.

Etch produces trace files that, for our purposes, list every virtual address referenced (be it an instruction fetch, a load, or a store) by a program during execution. Note that these trace files (like the applications) do not have any information about the underlying pages. These files are stored in .et format, and the parsing of that format is taken care of for you.

Vmtrace

The vmtrace application is the skeleton code for this assignment. In this assignment, that code does almost all of the work; we want you to have time to conduct an experiment.

Since this project does not involve modifying the kernel, it does not require use of VMWare. The vmtrace package should work on pretty much any UNIX machine (which should include any recent version of Linux, Mac OS X, *BSD, Solaris, or even Linux running under VMware). It should even work on Windows using the Cygwin package (make sure you install the zlib package if you are using Cygwin). Because trace analysis is very CPU intensive, we encourage you to use your own machine, if possible. As always, please do not use tahiti, fiji, ceylon, or sumatra.

If you are using a shared machine, please nice(1) your vmtrace process. Ex: nice ./vmtrace [vmtrace arguments]

Vmtrace is available on spinlock/coredump in /cse451/doug/vmtrace-1.X.tar.gz. (Where X is the release number number, which may be updated. Use the latest version.) For your convenience, the latest version is also available via http.

Like simplethreads, vmtrace contains a lot of files, but most are safe to ignore. Pay attention to:

File Contents

vmtrace.c The main() routine; very simple.

vmtrace.h Defines common datatypes (e.g. vaddr_t).

simulate.{c,h} The main loop; gets the next reference, determines if it is a fault, and updates the modified/reference bits.

fault.{c,h} The fault handlers; this is where you'll be adding most of your code.

pagetable.{c,h} Implements a pagetable. Also contains the definition of the pte_t struct.

physmem.{c,h} Models physical memory, which your replacement algorithm needs to manage.

stats.{c,h} Collect and output statistics. Note the increment-accessors are in stats.h as inline functions.

util.{c,h} Utility routines to access bit fields and compute logarithms/exponents (base 2). util.h also contains vaddr_to_vfn, which converts a virtual address to a virtual frame number.

options.{c,h} Parses command line options; if you add configuration parameters to your algorithm, you can parse them here.

input.{c,h} Parses the tracefile and returns the next reference. You probably won't need to modify or use these files.

Makefile.am This file lists the source files (both .c and .h) for the project. See below for instructions on adding new files.

File	Contents
`vmtrace.c`	The `main()` routine; very simple.
`vmtrace.h`	Defines common datatypes (e.g. `vaddr_t`).
`simulate.{c,h}`	The main loop; gets the next reference, determines if it is a fault, and updates the modified/reference bits.
`fault.{c,h}`	The fault handlers; this is where you'll be adding most of your code.
`pagetable.{c,h}`	Implements a pagetable. Also contains the definition of the `pte_t` struct.
`physmem.{c,h}`	Models physical memory, which your replacement algorithm needs to manage.
`stats.{c,h}`	Collect and output statistics. Note the increment-accessors are in `stats.h` as inline functions.
`util.{c,h}`	Utility routines to access bit fields and compute logarithms/exponents (base 2). `util.h` also contains `vaddr_to_vfn`, which converts a virtual address to a virtual frame number.
`options.{c,h}`	Parses command line options; if you add configuration parameters to your algorithm, you can parse them here.
`input.{c,h}`	Parses the tracefile and returns the next reference. You probably won't need to modify or use these files.
`Makefile.am`	This file lists the source files (both .c and .h) for the project. See below for instructions on adding new files.

The build procedure should seem familiar: it is identical to that for simplethreads. Vmtrace should compile without any warnings.

In summary, the steps are:

cd /cse451/LOGIN
tar -xvzf /cse451/doug/vmtrace-1.X.tar.gz
cd vmtrace-1.X
./configure
make

Run ./vmtrace -h to see the help/usage information. Note that you do not need to gunzip the tracefiles before using them; vmtrace will decompress them on the fly (assuming the zlib library is available on your system; the -h output will confirm this).

vmtrace has several options intended to make simulation easier. It can append the statistics to a given file (-o FILE) rather than printing them to stdout. The results are reported in comma-separated-value format (CSV) for ease of analysis. I recommend using the -o option to save your stats in combination with the -v option, which will output progress information.

To Add a Source File

If you add a new .c file, do the following:

Edit the Makefile.am file, adding the new .c file to the vmtrace_SOURCES list.
From the top-level directory (vmtrace-1.X), run automake.
Also from the top-level directory, run ./configure.
Your file is now added; run make as usual to build it.

If you add a new .h file, follow the same steps, but add the file to the noinst_HEADERS list. This will ensure that your file is included in your submission.

The Assignment

Part 1: Implement Page Replacement Algorithms

Vmtrace, as shipped, contains only a single page-replacement algorithm (random). For part 1, it is your task to add the LRU algorithm. The algorithm should find a space in physical memory for the given pte. This may mean evicting (physmem_evict) a page which is already occupying that space (note that nothing bad will happen if you call evict on a PFN that is not occupied). It should then call physmem_load to insert the pte.

To make your algorithms available, add them to the fault_handlers array in fault.c. See the random algorithm for an example.

Part 2: Design and Run an Experiment

Design an experiment using the scientific method to examine some aspect of virtual memory. There are many parameters available in the simulation - replacement algorithm, number of pages, page size, and parameters of the algorithm - that you may choose to vary; note that a good experiment will probably focus on one parameter (at a time).

The simulation currently reports the following statistics for each type of reference (instruction fetch, load, and store):

Statistic Description

references Total number of memory accesses.

miss Number of page faults.

compulsory Number of compulsory faults (first time a page was accessed).

evictions Number of times a page was removed from physical memory.

pageouts Number of times an eviction required a write to disk.

Statistic	Description
references	Total number of memory accesses.
miss	Number of page faults.
compulsory	Number of compulsory faults (first time a page was accessed).
evictions	Number of times a page was removed from physical memory.
pageouts	Number of times an eviction required a write to disk.

In addition, the statistics output includes the number of physical pages used, the page size (in bytes), the input file name, the replacement algorithm, and the simulation limit on number of references (or 0 if unlimited). This is intended to make it easier to track multiple experiments; using the same output file (-o), you can append successive trials to a single stats file. Note that the type statistics (ifetch/load/store) relate to the cause of the eviction or pageout, not the type of page that was evicted. You may find it useful to add more statistics to the simulation.

Trace File

The trace file is available on both spinlock and coredump as /cse451/doug/netscape.exe.et.gz. This trace, Netscape 3.1, is also available via http (83MB). If you are using spinlock/coredump to run your simulation, there is no need to copy the file; use it directly out of /cse451/doug.

You may wish to burn the trace onto CDs. CD burners are available in the labs. The files are accessible by mounting the cse451 drive as was done in project 2 to transfer your kernel files. For example, the Windows NT command net use l: \\coredump.cs.washington.edu\cse451 would map the cse451 directory from coredump to drive letter L in Windows NT (alternatively, you may be able to just enter \\coredump.cs.washington.edu\cse451 in any window path, though I've had better luck using the net command). Note: Because the CD burner requires a constant stream of data, it may be helpful to copy the file to the local machine before burning (but be sure to delete it afterwards).

A full trace simulation can take hours, so make sure to leave plenty of time for actually conducting the experiment. The nohup(1) command may be useful (normally, if you logout, your simulation would end; nohup in combination with background (&) will allow you to run your command and come back for the results later).

Part 3: Analysis and Report

Write a report, presenting your experimental design, data, analysis, and conclusions. As always, concise reports are better than overly verbose ones! You should consider how to most effectively present your data (graphs, charts, tables, and/or a discussion).

While your conclusions should contain a discussion of what you believe the experimental results mean, you should be careful to distinguish between what your experiment has actually proven and what you are speculating on. You are not required to write separate reports for both team members, but you may do so if you wish to (and be sure to clearly indicate that you chose so if you did).

Accepted file formats are TXT, HTML, DOC, PS, and PDF. Call it something very intuitive, say, report.pdf so that we can easily find it among the files in your submission.

Turnin

In your top-level vmtrace directory, run make dist. This will produce a file named vmtrace-1.X.tar.gz. Submit this file using the turnin(1L) program under project name proj4 by 11:59pm on the day it is due. turnin will not work on coredump/spinlock, so you'll need to use one of the general-purpose machines (sumatra, fiji, ceylon, or tahiti).

If you have added any files, run tar -tzf vmtrace-1.X.tar.gz and check to make sure your new files are listed.

Make sure to include your report along with your submission.

Print your submitted report and bring it to lecture on Wednesday, May 28.