CSE 451 Autumn 2000

CSE 451 Autumn 2000 - Project Assignment 1

Simple System Call

Out: 27 September, 2000
Due: 5 October, 2000

Assignment Goals

Become familiar with the tools and skills needed to understand, modify, compile, install, and debug the Linux software

Design and implement a simple system call that will provide information useful in a later assignment

Become accustomed to the level of specificity of the project assignment write-ups

Background

The Linux kernel allocates memory using a "buddy system." We'll discuss buddy system allocation later in the course. For now, it's enough to know that requests for memory (within the kernel) are required to be a power of two number of pages.

The routine

struct page * __alloc_pages(zonelist_t *zonelist, unsigned long order);

implements the allocation side of this memory management. The parameter order is the log of the number of pages requested. So, if order is zero, the requester wants a single page returned, whereas if order is 4, the requester wants 16 (contiguous, in real memory) pages.

The Assignment

To motivate a later assignment, we want to instrument the kernel so that we can write a user-level program that will print histograms of the actual request sizes handled by __alloc_pages(); that is, I want to write a garden-variety C program that prints out (a) the total number of requests to __alloc_pages() that have been made since the system was booted, and (b) the fraction of that total that were requests for one page, for two pages, for four pages, etc.

To do this requires three things:

Modify __alloc_pages() to keep track of this information.

Design and implement a new system call that will get this data back to the user application.

Write the user application.

Warning 1: Remember that the Linux kernel runs in real-mode (memory addresses refer to real memory), while the calling application runs in some virtual address space (so addresses it gives are in that address space).

Warning 2: Remember that it's inconceivable that this problem has never before been confronted in the existing kernel.

Warning 3: Remember that the kernel must never, ever trust the application to know what it's talking about when it makes a request.

Warning 4: Remember that you must be sure not to create security holes in the kernel with your code.

Unusually Pointed Directions

The part of the user-level application you didn't learn in CSE 142 is this:

#include <sys/syscall.h>
#define __NR_orderhist something
#include <unistd.h>
....

int ret = syscall(__NR_orderhist, ...);

(One last detail: If we were really implementing a new system call, we'd put the #define above in <sys/syscall.h>. But, we're better off not monkeying with that file, as it's shared among all of us.)

Recommended Procedure

I suggest you wade, rather than dive, into this. In particular, here's a suggested set of incremental steps:

Don't change any Linux code. Figure out how to do a make of a new boot image, what file to move where so that you can boot the image you just created, how to tell the loader (LILO) that your image exists, and then how to boot your image.

Now put a "printk()" somewhere in the code, and figure out how to find its output. (Hints: /var/log and "man dmesg").

Now implement a parameterless system call, whose body is just a printk() call. Write a user-level routine that invokes it. Check to make sure it was invoked.

Now write the full implementation.

What to Hand In

You should hand in a write-up of what you did. Generally, these write-ups will include only, at most, snippets of code. That's the case here. Tell us:

Which files you modified

What you did to each

Why you did it (to the extent this is necessary)

What your new system call interface is

What alternative interfaces you considered. (What were the issues you had to confront in designing the interface?)

Include the complete code of the routine implementing the system call in the kernel (the new routine you wrote, not all the existing infrastructure that gets control to your routine).

This is all due in section on 10/5/00.

Details

The source is on the two gateway machines, greer and baughm, in /scratch/linux-2.2.4-test6IKD.tgz. This is a gzip'ed, tar'ed file. You'll need to extract the source to make a private copy. After you've done that, you'll work from your private copy.
Accounts

You'll have personal accounts on both the gateway machines and the test boxes.

Your gateway (greer/baugh) account is your normal instructional account.

You'll get your test box personal accounts, and passwords, in section on Thrusday.

On Thursday you'll also get the root password (the password for account "root", also known as "superuser", which is the equivalent of the "administrator" account in Windows) for the test boxes. With the root password you can do anything - create or delete files anywhere in the file system, create accounts, reboot the machine of the guy next to you, you name it.

Why have personal accounts on the test machines? If you get to the point of doing significant work on those machines, e.g., debugging, you might want to have some personalized settings for things like the editor and the window system ("X Windows"). (On the other hand, for some kinds of debugging you might end up wanting to be logged in as root.) Personalized settings are kept in configuration files, usually with names prefixed with a period (so that they aren't shown when you issue a standard ls (Windows "dir") command), and usually stored in your home directory. If everyone logs in as root, that's only one user as far as Unix is concerned, and so you'll end up getting the customizations of whoever set them last. (This is like all of us sharing the same Windows profile.) So, you might want to be able to login and be you. Of course, if the machine is blown away by someone and rebuilt, well, your "dot files" may not live through that.

Etiquette

As root, you can change "system files." This will cause a long and vehement stream of "bloody hell"s to be emitted from the next user of that machine. For example, you can change the files in /usr/include, which is where things like <stdio.h> are kept. Don't.

Where to Keep Files

You should keep your private copy of the full source on greer or baughm. Do not keep the source in the home directory you'll have when you login to those machines. That directory is your standard (Unix) instructional home directory. If you attempt to extract that source into that directory you will immediately exceed your storage quota.

Instead, god willing, you'll find a directory set up for your use in /scratch on greer and/or baugh. That is where you should do your work. Note, though, that these files are not backed up. The systems are stable, but it would be a good idea to copy only those source files you have changed to your instructional account files space from time to time. (Instructional account file space, e.g., your home directory on those machines, is backed up.)
Useful Unix Commands
There are a lot of them. Here are some places to begin

man
man is the name of the Unix help command (short for "manual"). So, "man man" will tell you how to use man. A useful variant is "man -k ...", which searches an index for a keyword (supplied as a parameter to this command).

grep
grep searches a file for a regular expression, and prints out lines that contain matches to that expression. So, "grep pmd_t vmalloc.c" will print all lines containing "pmd_t" in file vmalloc.c." "grep -r" does the same thing, except that it recursively descends into all subdirectories included in the file list parameter; e.g., "grep -r pmd_t *" will search all files in the current directory, and all files in any subdirectories, and on and on.

make
make is a very general purpose facility, but its primary use is to automate recompilation of only those files that have been changed (and so need recompiling) to create a new executable of the latest code. (This is called "a make.") For us, all we need to know is that "make <target>" will do something, and what the targets are we need to make the kernel. (Documentation on targets is in the SPL page and the Running Linux book, as well as the kernel source main directory README file.)

ssh / scp
Secure remote shell and remote copy, respectively. The first lets you login to a machine across the network; the second to copy files between two machines. Both use encrypted forms of communication.

shutdown
Guess what it does. "shutdown -r now" reboots (now, killing whatever is running). "shutdown -h now" shuts down (although you may have to manually power down the machine, if that's what you're after). Try hard to avoid just power cycling; it tends (more than on Windows, in my experience) to cause problems in the file system when the machine comes back online.

cat myfile | more
"cat" means "type (on my screen)", so "cat myfile" types the file named myfile on the screen. The "|" is called a pipe. It connects the output of the command on the left with the input of the command on the right. "more" is a program that pauses printing each time a screenful has shown up, waiting until you hit <cr> or space (or possibly other things). ("less" is a useful variant of "more.") The overall effect is that file "myfile" is dumped to your screen, pausing each time the screen is filled. [This description is a little inaccurate. See the man pages if things don't work as you expect.]

startx
If you're on a test box and X Windows isn't running and you want it, this is your command.

gunzip / tar
These won't come up again, probably, but you need them to extract the source from linux-2.2.4-test6IKD.tgz.

Unix Shells

The Unix shell is the program that reads the commands you type and figures out what they mean. In Unix, it's easy to create new shells, and there are lots of them out there.

When you're running as root, you'll be using the bash shell. When you're running as you, you'll be using some shell, maybe bash, maybe tcsh, maybe something else, This means you might be talking to different command interpreters at different times. For simple interactions, you won't be able to tell the difference among the shells, and it won't matter. As you get more sophisticated, either you'll just automatically adjust to whichever shell you happen to have (your fingers will do the thinking), or else you'll figure out how to change your shell.

Unix Files

It's just like windows (folders are called directories; files are called files). Unix does not rely on the filename (and in particular the file extent) to tell it what the type of file is, for the most part. But, you should know that .c files are C source files, .h files are header files (just like in MSVC), and .o files are (essentially) machine instructions - the result of compiling something.
For what it's worth, moving around in the Linux file system is just like moving around in a Windows file system using a command line (DOS) window, more or less. cd <dir> changes the working directory to the named directory. ".." is the name of the parent of the current directory, so cd .. moves up one level in the file tree. "." is the name of the current directory. pwd prints the (fully qualified) name of the current directory. ls lists the names of the files in the current directory.

Editors

Vi and emacs. There's some information on these in your Running Linux book; there's also information online (man, for instance), and undoubtedly on the web. Plus, there's information about them sitting in front of screens in the various instructional labs.

Getting Along in Unix

The biggest distinction between Unix and Windows is attitude. The Unix attitude is "I won't give you a fork, because some people want 3 tine forks and some want 4, so I'll instead give you a handle and a box of tines and a way to connect them, and you build what you want." ("And, oh, by the way, maybe what I should really give you is some iron ore and a smelter and you can make a lot more cutlery than forks.") The Windows attitude is "You say food and the next thing you know a complete menu Microsoft has chosen for you based on your past preferences appears magically in your mouth. What could be more convenient?"

To tell you the truth, these days I prefer Windows when I actually want to get something done. But, there's a lot (a huge lot) to be said for the Linux way of doing things. It will seem crude at first, but once you get the hang of it, you'll find that there is amazing power in providing a (reasonably thought-out) set of basic building blocks that know how to talk to each other, and trusting the user to find ways to make new things out of them.