CSE 451 Autumn 2006: Project 1

Project 1 - The Shell and System Calls

Out: Wednesday, October 4, 2006
Due: Friday, October 13, 2006, 5:00pm
Write-up: Friday, October 13, 2006, 5:00pm

This project should be done with your project teams.

Assignment Goals

To dust off your C programming skills.
To understand the relationship between OS command interpreters (shells), system calls, and the kernel.
To become familiar with the tools and skills needed to understand, modify, compile, install, and debug the Linux kernel.
To design and implement an extremely simple shell and system call.

Changes

An incorrect version of this project was posted on Wednesday. A new version (this one) is available as of Thursday evening. There are a few changes to the design of the system call you are asked to implement. In addition, some guidance on how to implement syscalls is also provided. You should reread the entire document to make sure you understand it. Where possible, changes are in bold. If you just want to jump to the changed paragraphs, they are at: [0] [1] [2] [3] [4].

Background

As we discussed in class, the OS command interpreter is the program that people interact with in order to launch and control programs. On UNIX systems, the command interpreter is usually called the shell: it is a user-level program that gives people a command-line interface to launching, suspending, and killing other programs. sh, ksh, csh, tcsh, bash, ... are all examples of UNIX shells. (It might be useful to look at the manual pages of these shells, for example, type: "man csh".)
Every shell is structured as the following loop:

print out a prompt
read a line of input from the user
parse the line into the program name, and an array of parameters
use the fork() system call to spawn a new child process

the child process then uses the exec() system call to launch the specified program
the parent process (the shell) uses the wait() system call to wait for the child to terminate

once the child (i.e. the launched program) finishes, the shell repeats the loop by jumping to 1.
Although most of the commands people type on the prompt are the name of other UNIX programs (such as ls or more), shells recognize some special commands (called internal commands) which are not program names. For example, the exit command terminates the shell, and the cd command changes the current working directory. Shells directly make system calls to execute these commands, instead of forking a child process to handle them.
This assignment consists of two parts. In the first, you will design and implement an extremely simple shell that knows how to launch new programs, and also recognizes three internal commands (exit, cd, and execcounts), which we will describe below. The first two internal commands will work by calling existing system calls (exit and chdir); the third internal command will work by calling a new system call that you will design and implement. So, in the second part of this assignment, you will design and implement the execcounts system call. This will involve making changes to the Linux kernel source code. The semantics of the execcounts system call, and some hints on how to go about implementing it are also described below.

The Assignment

Part 1: Build a new shell

Write a shell program in C which has the following features:

It should recognize two internal commands, exit and cd. exit terminates the shell, i.e., the shell calls the exit() system call or returns from main. cd uses the chdir system call to change to the new directory.

If the command line is not an exit or cd, it should assume that it is of the form
<executable name> <arg0> <arg1> .... <argN>
Your shell invokes the executable, passing it the command line.

Users may enter commands without their full path names (e.g., ls instead of /bin/ls. Your shell should use the $PATH variable to run these programs. See the manual page for the execvp function. Also, try to use the same prompt as in the following:

CSE451Shell% /bin/date Sat Jan 6 16:03:51 PST 1982 CSE451Shell% /bin/cat /etc/motd /etc/shells Pine, MH, and emacs RMAIL are the supported mailers on the instructional systems. [and on and on...] /bin/sh /bin/bash [and on and on...]
Note: The words in bold are output by the shell and the words underlined are typed in by the user.

Please take a look at the manual pages of execv, fork and wait.

To allow users to pass arguments to programs you will have to parse the input line into words separated by whitespace (spaces and '\t' tab characters) and place these words into an array of strings. You might try using strtok() for this (man strtok for a very good example of how to solve exactly this problem with strtok).
Then you'll need to pass the name of the command as well as the entire list of tokenized strings to one of the other variants of exec, such as execvp(). These tokenized strings will then end up as the argv[] argument to the main() function of the new program executed by the child process. Try man execv or man execvp for more details.

Part 2: Add a new system call

There are four system calls in Linux related to creating new processes: fork, vfork, execve, and clone. (The man pages will describe for you the differences among them.) Instrument the kernel so that we can write a user-level program that will print counts of the number of times each of these four system calls has been invoked (by a particular process on the system, and its descendents); that is, I want to write a garden-variety C program that can invoke another program, and determine how many of each of these four system calls that program made.

To do this requires four things:

Modify the process control block definition in the kernel (struct task_struct in include/linux/sched.h)

Modify the kernel to keep track of this information.

Design and implement a new system call that will get this data back to the user application. The following walkthrough is an excellent guide. Be aware it was written for a slightly different version of linux with fewer preexisting system calls; as one example, where it refers to entry.S, you'll want syscall_table.S instead.

Write the user application.

There is no need to reset these statistics. (Counter to the initial version of this document.)

There are several different ways to approach this problem. It is your job to analyze them from an engineering point-of-view, determine the trade-offs, and defend the implementation you select.

Warning 1: Remember that the Linux kernel should be allowed to access any memory location, while the calling application should be prevented from causing the kernel to unwittingly read/write addresses other than those in its own address space. Details about this are here.

Warning 2 (Hint 0): Remember that it's inconceivable that this problem (warning 1) has never before been confronted in the existing kernel.

Warning 3: Remember that the kernel must never, ever trust the application to know what it's talking about when it makes a request, particularly with respect to parameters passed in from the application to the kernel.

Warning 4: Remember that you must be sure not to create security holes in the kernel with your code.

Warning 5: Remember that the kernel should not leak memory.

SOME HINTS (Read Carefully)

You should be using the C language whenever you alter or add to the Linux kernel.

You can't just make system calls directly from C. Instead, you need to use the syscall function and pass it the number of your new system call. This code fragment show you how to do that:

/*
* Set the features included by the linux libc to have the BSD extensions
*/

#ifndef _BSD_SOURCE
#define _BSD_SOURCE 1
#endif

#define _NR_execcounts someNumber

#include <unistd.h>
...
int ret = syscall(__NR_execcounts, ...);

Hint/Thought question: Why are we interested in knowing how many system calls are made by a program and its descendents? Why is this considerably more useful than knowing how many system calls were simply made by a program itself?

Recommended Procedure

I suggest you wade, rather than dive, into this. In particular, here's a suggested set of incremental steps:

If you've never compiled the kernel, go back through the lab information page or see the Linux Kernel HOWTO. It should not take longer than an hour and it will ensure that you are up to speed with VMware.

Now implement a parameterless system call, whose body is just a printk() call. Write a user-level routine that invokes it. Check to make sure it was invoked.

Now write the full implementation.

Part 3: Integrate the system call into the shell

Now that you have a working shell and an implementation of your new system call, it's time to integrate them; this should be very simple. Add a new internal command to your shell, called execcounts. This command will take as arguments, the name of a program to run and any command arguments it may have. e.g., To determine how many fork/exec/etc's are performed by the ls command, you could run: CSE451Shell% execcounts ls -la. The shell should invoke ls -la and display any output from this program (as usual). When ls terminates, the shell should print the number of calls to fork/vfork/exec/clone made by ls, by invoking the system call that you build in Part 2.

Part 4: Some additional questions

Answer these additional questions and include them with your write-up:

1) What is "asmlinkage" as it occurs in the Linux kernel source, what does it do (give a short description)?

2) gotos are generally considered bad programming style, but these are used frequently in the Linux kernel, why could this be? This is a thinking question, so justification is more important than your answer.

3) What is the difference between the "clone" and "fork" system calls?

4) How could you extend your shell to support multiple simultaneous processes (foreground and background...)?

5) How long does your new system call take (time it using gettimeofday and give an approximate answer)? Explain your timing methodology.

You should turn in the following:

The C source code of your shell.

The source code of the files you changed in the kernel.

The names of all of the Linux kernel source files that you modified, and a written description of what you did to them and why you needed to do it (i.e. why was it necessary to modify this particular file).

Documentation for the new system call (i.e., a miniature man page for the system call itself, not for the shell command). A text file is fine.

The complete source code of the routine that implements the new system call in the kernel (i.e., just the new code you wrote, not the source code that was already in the kernel that got control to your new routine).

A transcripthis printout. As always, do man script to find out how to use the command.)

To attempt to achieve some sort of uniformity in results for the new system call, hand in the results obtained from the following:

Invoke your shell

Run the following command exactly as given:
execcounts /usr/bin/find /etc -type f -exec touch t ;
Note: to try that command in most "real" shells, use the following:
/usr/bin/find /etc -type f -exec touch ~/t \;

A brief write-up (about a page) with the answers to the following questions.
Describe how you found the information needed to complete this project. Did it have the information you needed? Did you consult with any humans? If so, what did you try first and who did you consult with?

Explain the calling sequence that makes your system call work. First, a user program calls <.....>. Then, <.....> calls <.....>. ... and so on. You can explain this using either text or a rough (less than 15 minutes) diagram.

Why do you think the designers of Linux implemented system calls the way they did? What were they trying to achieve? What were they trying to avoid?

Give (in 1-2 sentences) an alternative idea for implementing system calls. State one way your idea would be better or worse than the way it is currently done.

Include all questions from part 4

Turn in your write-up electronically in a separate file along with your code.

Do not underestimate the importance of the write-up. Your project grade depends significantly on how well you understood what you were doing, and the write-up is the best way for you to demonstrate that understanding.

The grade on the project will be calculated as follows:

Shell: 20 points

System call: 20 points

Write-up: 20 points

Submission instructions: We will be using the turnin(1L) program.

Get all of your files onto one of the general-purpose instructional Linux machines (attu). scp(1) may be useful. You can also use the WinXP machines to access your instructional Linux home directory. See Lab Support's NT Dfs Page; you want to mount \\ntdfs\cs\unix\homes\iws\LOGIN.

Run turnin -c cse451 -p project1 [FILES]