CSE 451 Winter 2001: Project 3

Project 3 - Fair-Share Scheduling

Out: Monday February 12th, 2001
Due: Monday February 26th, 2001

Assignment Goals

To get your hands dirty with time-sharing scheduler implementations
To gain some exposure to the issues of scheduling on multiprocessors
To gain some exposure to the relationship between scheduling and other parts of the OS (e.g., I/O and memory management)

Unusually generous hint

You might want to take a look at this link before you dive too deeply into this assignment. Note that this link describes the Linux 2.2 kernel, and we're working with the 2.4 kernel in this course. So, some things have changed (in particular, multiprocessor scheduling has been simplified).

Background

Linux actually contains three scheduling policies, one for "general use" and two for a kind of real-time processing. We'll be concerned with the general use policy, which is a descendant of the multi-level feedback approach discussed in the book. However, the other two have some penetration into the code you'll end up modifying, so you need to understand at least a tiny bit about them.
Quoting from "man sched_setscheduler":

SCHED_OTHER is the default universal time-sharing scheduler policy used by most processes, SCHED_FIFO and SCHED_RR are intended for special time-critical applications that need precise control over the way in which runnable processes are selected for execution. Processes scheduled with SCHED_OTHER must be assigned the static priority 0, processes scheduled under SCHED_FIFO or SCHED_RR can have a static priority in the range 1 to 99. Only processes with superuser privileges can get a static priority higher than 0 and can therefore be scheduled under SCHED_FIFO or SCHED_RR. The system calls sched_get_priority_min and sched_get_priority_max can be used to find out the valid priority range for a scheduling policy in a portable way on all POSIX.1b conforming systems.

and:
SCHED_OTHER: Default Linux time-sharing scheduling.
SCHED_OTHER can only be used at static priority 0. SCHED_OTHER is the standard Linux time-sharing scheduler that is intended for all processes that do not require special static priority real-time mechanisms. The process to run is chosen from the static priority 0 list based on a dynamic priority that is determined only inside this list. The dynamic priority is based on the nice level (set by the nice or setpriority system call) and increased for each time quantum the process is ready to run, but denied to run by the scheduler. This ensures fair progress among all SCHED_OTHER processes.
Note that these describe the logical operation of the scheduler; the actual implementation might (and probably will) not be quite what you'd expect from reading these descriptions.
As with many schedulers, one of the key concepts in the Linux scheduler is round-robin (RR) scheduling. Part of the point of RR scheduling is fairness. In particular, there is fairness among processes, in the sense that each process (of a given priority) receives an equal allocation of CPU time over an appropriately measured interval. One problem with RR scheduling is that it is per process fairness, not per user fairness. What this means is that a user who starts many processes will receive, in total, a much larger fraction of system resources than will a user who has only a few processes running.
Fair-share schedulers address this problem, using the RR mechanism, but with the goal of providing equal allocation per user (that is, summing over all processes being run for that user), not per process. As an example, if user A has four times as many processes running as user B, each of user A's process would receive only one fourth of the allocation per second, say, given to processes for user B, so that users A and B split the CPU resource equally. Under the default scheduler, user A would receive 80% of the CPU.
Complications:

The examples above assume that all processes are in tight CPU loops. That isn't the typical case. Instead, processes perform IO and other blocking operations. This complicates the definition of "fair share." (A more detailed discussion of this than you need for this assignment is given in A Coarse Grained Fair Share Scheduler, as well as in other references available on the web.)
Part of this assignment (to be completed in Part 2) is for you to decide just what the objectives of your scheduler are. The objectives are distinct from the mechanisms used to actually perform the scheduling. Typically, the mechanisms do not achieve the objectives, except in extreme cases (e.g., all running processes are in tight CPU loops). In fact, typically it's difficult or impossible to give a convincing argument about just what "should happen" for a given set of real processes.
If it were me doing this assignment, I'd try to adopt the mechanisms and objectives of the current Linux scheduler, subject to the change to fair-share as the overall flavor of the scheduler, on the theory (which is true) that the current scheduler represents a survival of the fittest evolution of the algorithm based on observations of its performance in actual use over a number of years.

In any case, though, you're free to make whatever decisions you think are appropriate, but you should try to be complete in your Part 2 write-up about what those decisions are.

The Assignment

Before you begin the assignment, log into spinlock or coredump, and copy the following file into your directory:
/cse451kernels/gribble/fairshare.tar.gz
This tar archive contains the source code and binaries of a number of utilities that you will find useful during the course of this project. We'll describe them below. To extract the files from the archive, use the following command:
tar -xvzf fairshare.tar.gz
A directory called fairshare/ will be created, and the utilities will be extracted into it.

Assigment Overview

There are 3 parts to this assignment. In Part 1, you will use your multithreaded server implementation from project 2 to demonstrate a particular inadequacy of the existing Linux scheduler. In Part 2, you will read the source code to the existing scheduler, and to design (on paper) the modifications necessary to support fair-share scheduling, as defined above. In Part 3, you will implement, debug, and test your modifications.
Warning: it should be obvious that if you can find an implementation of fair-share scheduling for Linux on the web, we can find that same implementation. You should do this project without referring to an existing implementation of fair-share scheduling.
Linux Source and Configuration

The most directly relevant code is in .../kernel/sched.c. It is certain that you will need to look at other source files as well.
You should make decisions appropriate to execution on multi-processors when writing your code. Of course, our (virtual) machines are not multi-processors, so this is a bit of pretending. We MAY have an actual multi-processor available, on loan, for testing by those who are interested. However, the bottom line is that you should write code that would be sensible for execution on multi-processors, but your code has to be tested only on uni-processors. This means that you must worry about race conditions. However, I don't believe that you need to create any new policy or mechanism specific to multiprocessor scheduling.

Recommended Procedure

It is very inconvenient to have a hard bug in the scheduler - the machine simply won't run your copy of the kernel, and you'll spend days rebooting VMware.
For that reason, I recommend that you (a) think hard about your code before you recompile and install, and (b) make modifications incrementally, not all at once. If the machine fails to boot, or to run, it's a good guess that the last set of changes you made are the problem. (It's not certain, of course.) Making sure that set of changes is small can help find the problem. Additionally, you might find that instrumenting the code (printk's, or similar sorts of things) helps you, and even that building some additional tools (e.g., a new system call, and a user-level application that invokes it) are worth the effort.

Part 1: Demonstrate the inadequacy of the current Linux scheduler

In project 2, you implemented a multi-threaded server using your thread pool implementation. In this project, you will make use of your multi-threaded server to evaluate the effectiveness of your new fair-share scheduler.
The Linux pthreads implementation uses kernel threads: every thread that you create in your thread pool corresponds to a kernel thread. In Linux, kernel threads are nearly indistinguishable from processes; from the perspective of the scheduler, kernel threads are represented by the same data structure as processes, and they are scheduled identically.
In this part of the assignment, you will run two copies of your server (both on the same VMware machine), with each copy running under a different user ID. Your goal is to demonstrate that using the current Linux scheduler, the throughput of each server is proportional to the relative number of threads in each server's pool. So, if the first server has X threads in its pool, and the second server has Y threads in its pool, then the first server's throughput should be proportional to X / (X+Y). You should run each server using NUM_LOOPS=50000, so that most of the CPU cycles are consumed inside the dispatched threads from the thread pool.
To launch servers using different user IDs, you should make use of the spawn utility that is included in the fairshare.tar.gz archive.

Here is a description of the utilities inside the fairshare.tar.gz archive.

Unfortunately, the spawn utility doesn't allow you to launch a program that expects command line arguments. here is a way to get around this that makes use of shell scripts.

WHAT TO TURN IN FOR PART 1
As part of your writeup for this assignment, you need to submit a graph that demonstrates that the throughput of a server is proportional to X / (X + Y), as described above. It is up to you to figure out what an appropriately expressive graph is...

Part 2: Design your Fair-Share Scheduler

For Part 2, your task is simple: read the source code to the existing Linux scheduler, and design (on paper) how you want your fair-share scheduler to work.
WHAT TO TURN IN FOR PART 2
As part of your writeup for this assignment, you need to answer the following questions:

What does the current Linux scheduler do? Make sure you address all of the following questions:

What is the current scheduling mechanism? Be specific and thorough. (It should be possible to re-implement the behavior of the current scheduler from your description, which should be a kind of specification.)

Is starvation possible with the current mechanism? If so, give an example of how it might occur.

Suppose that all processes on the system are scheduled using SCHED_OTHER, and that all are in tight CPU loops. Give an expression that indicates the fraction of CPU time that will be allocated to a single process, as a function of its base ("nice") priority. (Note: The answer to this depends on at least one Linux configuration parameter.)

Is there aging? That is, are the priorities of processes that have low recent CPU consumption raised to avoid effective starvation?

Are IO bound processes (those whose ratio of IO to CPU use is high) given any kind of favoritism?

What are the objectives of your fair-share scheduler? Make sure you tell us how you want to allocate CPU among processes (a) that all belong to a single user, and (separately) (b) that are not in tight CPU loops. That is, tell us what you want "fair-share" to mean. It is okay, expected even, that the answers to some of these questions can be answered best by referring to the implementation (just as exactly what the current scheduler does can be defined fully only by its implementation).

What are your plans to modify the existing scheduler implementation to achieve fair-share scheduling? Your answer to this question should be "more than a specification": it should be a specification (sufficient for some other trained person to implement from), plus it should identify specific source files to be modified.
A Strong Suggestion: You have two weeks to complete this entire project. I strongly urge to you do Part 2 during your first week; in other words, try to get it done by Monday February 19th. If you send us your Part 2 writeup by Monday Feb 19th, I will do my best to provide you with some feedback in time to affect what you do in Part 3. (I can't guarantee this; it depends on how many of you send me writeups, and how much free time I have in which to do this. But, I'll do my best.)
In any case, if you don't have Part 2 done by Monday the 19th, you will be far behind, and you'll have a VERY tough time getting the entire assignment done by Monday Feb 26th.

Part 3: Implement your Fair-Share Scheduler

Implement your fair-share scheduler, as you designed it in Part 2.
WHAT TO TURN IN FOR PART 3:
A short write-up that describes how what you ended up doing differed from your plans from Part 2, if at all.

Repeat your experiment from Part 1, including generating a new graph, and provide a short description that indicates why you think your scheduler is doing what you want.

Concoct a new experiment(s) using the piglet and io utilities that shows why your scheduler is doing what you want when I/O bound processes are involved. In your writeup, provide us with a description of your experiment, the results of the experiment, and a short description of how this shows your scheduler is doing what you want.

A printout of any source code (.c and .h files) that you changed. To keep this as small as possible, just give us complete routines (rather than files) for .c changes, and lists of additions/deletions from .h files. (Be sure to identify the file that each such excerpt comes from.)
Copy your final bzImage kernel file that includes your fair-share scheduler into the directory /cse451kernels/gribble/assignment_3 on either spinlock or coredump. You should call your file bzImage_yourname, where "yourname" is your first and last name. So, I would issue the following commands:
        cp bzImage /cse451kernels/gribble/assignment_3/bzImage_steve_gribble
      

What to Turn In

By the end of class on Monday February 26, you need to turn in the following:

a document that contains everything described under "WHAT TO TURN IN" in Parts 1, 2, and 3 above.

you must have copied your bzImage into our directory, as described in Part 3 above.