NEW: we will count the best 3 grades you get in your projects, rather than all 4 grades. This means that if you're happy with your project grades for projects 1 through 3, you don't need to do project 4.

CSE 451 Autumn 2000 - Project Assignment 4

Page Replacement Policy

Out: Monday February 26th, 2001
Due: Wednesday March 7th, 2001

Assignment Goals

Background

Page replacement is needed when a page not currently in memory must be brought into memory and insufficient (including no) free frames are available.  The Linux page replacement algorithm is a combination of what the book calls clock and second-chance  (FIFO).  

Linux keeps allocated frames organized in a number of lists, depending on their use.  The page swap cache is used to hold pages for transfer between main memory and disk.  Except for brief intervals when a page is being written back, the page swap cache contains only clean pages.  Dirty pages in use by a process are not in the buffer cache, but can be found by either of two means:

  1. Scan through the page map structures of the tasks, that is, the actual page tables.
  2. Scan through a data structure that describes all physical memory frames.  (This is pointed at by variable mem_map.  Confusingly, the type of the elements in that array is struct page.)

There are also other structures containing information about pages/frames.  The buffer cache is used for pages that contain IO buffers (for DMA transfer).  There are others.

The page replacement algorithm relies (primarily) on a kernel thread called kswapd, which starts execution in routine kswapd().  That routine causes the daemon to block, waiting for some other thread to wake it up.  Other threads wake it up when they discover, as they are trying to allocate new frames (called pages in the code, of course), that the number of free frames has fallen below some critical threshold.  When awoken, the kswapd thread starts trying to find frames that can be freed.

It does this in two stages.  First, it looks around for pages that can be released without having to write data to the disk.  If it can recover enough frames this way, it's happy and goes back to sleep.  If not, it enters the dread "swap" phase.

The swap phase is more aggressive.  It chooses a task to attack, and scans through its page tables looking for victims.  Every page is a potential victim;  the only thing that can protect it is if that page is in some weird state (e.g., some other thread has the page locked at the moment).  Mainly, though, those pages are going to be evicted.

Eviction is a little indirect, though.  In particular, writes of dirty pages are scheduled, and the pages are put in the swap cache, and the thread is done with them.  Eventually, they are actually written back, and so become clean, and so are available for harvesting during the first phase described above.

Caveats

The code is complicated, and a small set of confusing names are used for everything (e.g., "page"), and snippets that affect what is going on are everywhere.  So, the description above could be wrong in ways small or large.  But, having done myself what I'm asking you to do in this assignment, I'm sure it can be done with only a vague idea of what the actual truth of the matter is, and by reading code in only one code file.

Assignment Goal

The goal of this assignment is to change the details of the replacement of dirty pages.  At the moment, no page that has its (hardware) reference bit set is replaced; instead, that bit is simply set off.  This is a form of the clock algorithm.

The first specific assignment is to change this "single bit" algorithm to use two or more bits, as described in the book.  (We want the effect of what the book describes, not necessarily the precise implementation.)  The number of bits to use is up to you.  

The second specific assignment is to try to determine whether or not your change improves performance.  My guess is that changes in behavior will not be dramatic between the original and your version, so don't be dismayed if you can't come to a definitive conclusion.

Additional Caveat

Some performance metrics (e.g., elapsed time to complete) can vary dramatically from one run of a program to a second (identical) run.  Explaining why this happens would be a nice "bonus" in your assignment write-up (for which there are no specific bonus points, of course).  Make sure you run any program you use for evaluation repeatedly, as a single run can give very confusing results.  (Of course, so can multiple runs.)

Code Details

You need to deal primarily with the code in one file, .../mm/vmscan.c.  That file contains the kswapd() routine, and you can follow what it does from there.  (It calls other routines in that file, that call other routines in that file, etc.).

Additionally, the routine shrink_mmap() in .../mm/filemap.c implements "the first part" of the page replacement policy, as described above.  I don't see any reason why you should have to modify it.  (In theory you shouldn't even have to look at it, but it's likely you will want to at some point.)

Finally, you have to store your extra bit(s) somewhere, the two obvious choices being the (hardware) page table entries or the struct page frame descriptors.  This will "require" you to create new macros in the appropriate .h file, depending on which way you want to go. If you store them in the struct page descriptor, hey, it's only software, and you can modify that structure as you please.  If you store them in the pte's, you can't violate what the hardware is expecting.  Fortunately, for the x86, bits 9 through 11 are available to use as you please (except that Linux seems to be using bit 9 already, far as I can tell).  See http://developer.intel.com/design/pentiumii/manuals/24319202.pdf.

More Code Details

Page In Overview  (Bested viewed with IE, i.e., not Netscape)

Page Out Overview (Ditto)

I created and have been using a program whose goal was to allow the size of the virtual address space (in use) and the locality to be specified relatively easily from the command line.  The program, buildtree, creates a balanced binary tree and then looks up elements from that tree.  The elements looked up are chosen from a random distribution, the shape of which "controls" locality.

The program is available in /cse451kernels/gribble/PageReplacementProject/.

buildtree is invoked with the command line

buildtree <#nodes in tree> <#find operations to perfom> <distribution specified>

where <distribution specified> and be one of

The Unix command time is a simple way to measure elapsed time, e.g., time -v buildtree 130000 5000 g .5

Because no page evictions take place unless the system runs out of physical memory, if you use buildtree you need to experiment to find a reasonable tree size.  Too low, and there is no page eviction.  Too high, and, unfortunately, the standard Linux implementation (at least) thrashes, and can basically hang.

Optional Creativity

The standard implementation seems remarkably sensitive to, at least, the working set size of the running applications.  In particular, as you increase the tree node count, it eventually falls off a cliff, from running well to not being able to run at all.  Some of this can be inherent, if the application has sufficiently poor locality.  But, the performance drop-off seems more abrupt and severe than one would expect.

If you have the inclination and the time, an extension to this assignment is to try to improve the replacement algorithm in a way that gets the most dramatic performance impact you can.  For this, any change you'd like to attempt is fair game.

Even More Details

You might want to implement a system call that lets you "turn on" and "turn off" your modified page replacement algorithm, an an application that invokes it, so that you don't have to reboot to run comparisons of your implementation with the original one.

If you're going to do anything other than the basic assignment, and maybe even in that case, you might want to instrument the code so that you can figure out what is going on.  Again, a new system call, and a new application to use it, could be helpful.  I wrote one that lets me figure out things like how often various points in the code are reached, how often a dirty page being examined for replacement has its reference bit on, and how much time elapsed time was spent in various parts of the code.  You should be able to figure out how to do all but the last of these.

If you want to time code, the thing to use is the hardware cycle counter, which increments on each clock tick (I believe).  Here are the specific macros I used (in part stolen from the Linux code).  How to use them may or may not be apparent from looking at them (as they involve the other system call just mentioned above):

#define rdtsc(low,high) \ __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high)) 
#define STARTTIME unsigned long startlow, starthigh, endlow, endhigh, increment; rdtsc(startlow, starthigh); 
#define ENDTIME(x) rdtsc(endlow,endhigh); \ 
                                   increment = (endlow - startlow) >> 6; \ 
                                   /* if (endhigh > endlow) increment |= (1<<25);*/ \
                                  atomic_add((int)increment, &routineCounts[x]); \
                                  startlow = endlow; starthigh = endhigh;

What to Hand In

Due March 7th

You should hand in a write-up that:

  1. Explains what code changes you've made, in English, at a level appropriate for one of the course staff to read.

  2. Contains copies of new or modified code.

  3. Describes the procedures you used to evaluate whether or not your change improved performance, and what your conclusions are.

How Long Will This Take

You have to understand enough of the code in vmscan.c to know where to put your modifications.  This will require some reading;  I'd guess 7 hours would be a median time for that, but with very high variance.

Then you have to implement the basic assignment.  last quarter, the instructor took less than an hour to do that, but it was after more than 7 hours of reading.

If you decide to spend time on improving the page replacement algorithm, the "opportunities" provide entertainment measured in months.