Project 1 - The Shell and System Calls

Out: Thursday September 29th, 2005
Due: Friday October 7th, 2005 6:00 PM

Assignment Goals

Background

As we've discussed in class, the OS command interpreter is the program that people interact with in order to launch and control programs. On UNIX systems, the command interpreter is usually called the shell: it is a user-level program that gives people a command-line interface to launching, suspending, and killing other programs. sh, ksh, csh, tcsh, bash, ... are all examples of UNIX shells. (It might be useful to look at the manual pages of these shells, for example, type: "man csh".)

Every shell is structured as the following loop:

  1. print out a prompt
  2. read a line of input from the user
  3. parse the line into the program name, and an array of parameters
  4. use the fork() system call to spawn a new child process
    • the child process then uses the exec() system call to launch the specified program
    • the parent process (the shell) uses the wait() system call to wait for the child to terminate
  5. once the child (i.e. the launched program) finishes, the shell repeats the loop by jumping to 1.
Although most of the commands people type on the prompt are the name of other UNIX programs (such as ls or more), shells recognize some special commands (called internal commands) which are not program names. For example, the exit command terminates the shell, and the cd command changes the current working directory. Shells directly make system calls to execute these commands, instead of forking a child process to handle them.

This assignment consists of two parts. In the first, you will design and implement an extremely simple shell that knows how to launch new programs, and also recognizes four internal commands (exit, cd, physusage, and clear_physusage), which we will describe below. The first two internal commands will work by calling existing system calls (exit and chdir); the other internal commands will work by calling new system calls that you will design and implement. So, in the second part of this assignment, you will design and implement a system call or two. This will involve making changes to the Linux kernel source code. The semantics of these system calls, and some hints on how to go about implementing them are also described below.

The Assignment

Part 1: Build a new shell
Write a shell program in C which has the following features: Assume that the full names like /bin/ls are given. Also, try to use the same prompt as given below; the output produced by your shell should look like the following:

CSE451Shell% /bin/date
Sat Jan 6 16:03:51 PST 2001
CSE451Shell% /bin/cat /etc/HOSTNAME /etc/motd
spinlock.cs.washington.edu
Pine, MH, and emacs RMAIL are the supported mailers on the
instructional systems.

Contact support@cs if you need assistance.

Note: The words in bold are output by the shell and the words underlined are typed in by the user.
Please take a look at the manual pages of execv, fork and wait.

To allow users to pass arguments to programs you will have to parse the input line into words separated by whitespace (spaces and '\t' tab characters) and place these words into an array of strings. You might try using strtok() for this.

Then you'll need to pass the name of the command as well as the entire list of tokenized strings to one of the other variants of exec, such as execvp(). These tokenized strings will then end up as the argv[] argument to the main() function of the new program executed by the child process. Try man execv or man execvp for more details.

Part 2: Add new system calls

The Linux kernel allocates memory using a "buddy system."  We'll discuss buddy system allocation later in the course.  For now, it's enough to know that requests for memory (within the kernel) are required to be a power of two number of pages.

The routine 

struct page * __alloc_pages(..., unsigned int order, zonelist_t *zonelist);

implements the allocation side of this memory management.  The parameter order is the log of the number of pages requested.  So, if order is zero, the requester wants a single page returned, whereas if order is 4, the requester wants 16 (contiguous, in real memory) pages.

To motivate a later assignment, we want to instrument the kernel so that we can write a user-level program that will print histograms of the actual request sizes handled by __alloc_pages();  that is, I want to write a garden-variety C program that prints out (a) the total number of requests to __alloc_pages() that have been made since the system was booted, or since the statistics were last cleared, and (b) the fraction of that total that were requests for one page, for two pages, for four pages, etc.

To do this requires three things:

  1. Modify __alloc_pages() to keep track of this information.

  2. Design and implement a new system call that will get this data back to the user application.

  3. Write the user application.

We'd also like to be able to reset these statistics periodically, perhaps to track the memory usage of a single program. So we need a way to clear the request information we've tracked so far. This requires either parameterizing the above system call to add a clear option, or adding another system call.

Warning 1: Remember that the Linux kernel should be allowed to access any memory location, while the calling application should be prevented from causing the kernel to unwittingly read/write addresses other than those in its own address space. Details about this are here.

Warning 2: Remember that it's inconceivable that this problem (warning 1) has never before been confronted in the existing kernel.

Warning 3: Remember that the kernel must never, ever trust the application to know what it's talking about when it makes a request, particularly with respect to parameters passed in from the application to the kernel.

Warning 4: Remember that you must be sure not to create security holes in the kernel with your code.

Warning 5: Remember that the kernel should not leak memory.

SOME HINTS

You should be using the C language whenever you alter or add to the Linux kernel.

The part of the user-level application you didn't learn in CSE 142 is this:

#include <sys/syscall.h>
#define __NR_physusage something
#include <unistd.h>
....

int ret = syscall(__NR_physusage, ...);

(One last detail: If we were really implementing a new system call, we'd put the #define above in  <sys/syscall.h>.  But, we're better off not monkeying with that file, as it's shared among all of us.)

Recommended Procedure

I suggest you wade, rather than dive, into this.  In particular, here's a suggested set of incremental steps:

  1. Don't change any Linux code.  Figure out how to do a make of a new boot image, what file to move where so that you can boot the image you just created, how to tell the loader (LILO) that your image exists, and then how to boot your image.

  2. Now put a "printk()" somewhere in the code, and figure out how to find its output.  (Hints: /var/log and "man dmesg").

  3. Now implement a parameterless system call, whose body is just a printk() call.  Write a user-level routine that invokes it.  Check to make sure it was invoked.

  4. Now write the full implementation.

Part 3: Integrate the system call into the shell
Now that you have a working shell and an implementation of your new system calls, it's time to integrate them; this should be very simple. Add two new internal commands to your shell, called physusage and clear_physusage. The physusage command should invoke the first system call that you built in Part 2, and print out the histogram of all request sizes to __alloc_pages(). Only print out non-zero values (i.e. page sizes for which there have been at least one allocation request). The clear_physusage command should invoke the second system call (or call the first with appropriate parameters), and reset the collected statistics.

What to Turn In

You should turn in the following electronically (instructions to follow at a later date):

 

Do not underestimate the importance of the write-up. Your project grade depends significantly on how well you understood what you were doing, and the write-up is the best way for you to demonstrate that understanding.

The grade on the project will be calculated as follows:

Turnin instructions

We will be using the turnin(5L) program for electronic submission. To turn in your project:
  1. First, make sure your files are on or accessible from attu or a Linux machine in the lab.
  2. Create a folder named <your_username>-project1 (for example, mine would be rdunn-project1).
  3. Copy all the files of your project to this directory. Do NOT include things like executables, core files, etc. Just put in what is outlined in "What to Turn In" above.
  4. cd to the parent folder of your project directory
  5. Run the command: turnin -v -c cse451 -p project1 <name of your project folder>. Enter your section (the one you're registered for) when prompted.