CSE326 Unix/Compiling Tutorial

Preamble

Shells
All material in this course will assume the bash shell. This seems to be the default on a cygwin install. It does not seem to be the default on the instructional Linux boxes. However, the shell there will work identically for the following.
To determine your default shell:
$ grep yourloginname /etc/passwd
The shell will be at the end of the line of nonsense (e.g., /bin/bash). If it is not the bash shell:

Type which bash to find the absolute path name of the bash shell.
Use the chsh command to change your login shell to bash.
Because it takes a while for that to take effect, invoke bash explicitly typing its full path name as a command (e.g., /bin/bash).

Paths
Your path is the list of the directories the shell searches when trying to run a command you've typed. It's stored in the PATH environment variable as a colon-seperated list of directories. You can view your current path by echo $PATH.
A difference between most Unix shells and the Windows command shell is that the current directory is not searched by default. While this is a good idea from a security point of view, this can be annoying if you have scripts or programs that you're working on. If the current directory is not in the path, you can always execute anything something there by using a fully qualified path: ./foo. The "." here specifies the current directory, and foo is the command there you want to execute.
If you see "." or "./" in your path, you're all set. Otherwise, you can put the current directory on your path as follows. The command to use depends on which shell you're in.
In bash, do export PATH=./:$PATH
(no space between the two PATH words)
In tcsh, do setenv PATH ./:$PATH
(the space between the first PATH and ./ is significant)
Both these command prepend ./ to your current path. The exercises below assume this has been done.

Exercises

Do hw0 from 303
This will give you a bit of practice with the command line.
Learn how to compile and run programs
Make a directory tut0 in your home directory, and copy all the files from /cse/courses/cse326/05wi/tut0/ there:

cd ~
(to get to your home directory if you're not already there)
mkdir tut0
cd tut0
cp /cse/courses/cse326/05wi/tut0/* .
("*" means all files in that directory, and "." means the current directory)

Look at the file insert.c This implements insertion sort. View it in emacs, or from the command line by less insert.c. q quits less, j/k scroll up/down, and h give you help. Remember you can also man less from the command line to get more help.
Now we're going to build insert. This is straightforward:
gcc -g -o insert insert.c
gcc is our compiler. The -g flag tells gcc to build with debugging symbols: this will be handy later, and using this flag is a good habit to get into. -o insert tells gcc to name the output executable insert (if you didn't use the -o flag the default is a.out, which isn't very descriptive). Finally, the insert.c argument tells gcc what to compile.
Now run insert:
insert
You should get a message complaining that you didn't give insert a filename to sort, as indeed we didn't. So let's make an input file. The files rand.pl and trans.pl should be in the directory as well (do ls to make sure they're there; if not, copy them from tut0 in the course directory). These programs may be used just like any other, but they're scripts (Perl scripts in fact) rather than binary executables. Look at them using emacs or less, or just cat them as they're short. For example:

cat rand.pl
will output:
#!/usr/bin/perl

$N=shift;
$N > 0 or $N=50;

@a = (1..$N);

for $i (0..$N-1) {
    $j = int(rand($N-$i))+$i;
        ($a[$i], $a[$j]) = ($a[$j], $a[$i]);
}

print join("\n",@a) . "\n";
This looks nasty, unless you know Perl. All that's important to notice here is that scripts are just text files, and that the #! line at the beginning tells your shell what program to use to execute the script (in this case, /usr/bin/perl).
Now let's produce some test files. rand.pl 10 will output a randomly permuted list of 1 through 10. That's nice, but we need to put that into a file so insert can use it. This can be done using redirection:
rand.pl 10 > r10
The > operator takes the output from rand.pl, and puts it in the file r10. You can double-check this by doing cat r10.
Now we can give this to insert:
insert r10
This should execute quickly without producing any output. Not very interesting. Let's give it a harder input:

rand.pl 50000 > r50k
insert r50k

This should take several seconds to execute. Time it on your watch.
Repeat this with inputs of 25000 and 75000 elements. Compare the difference in execution time. Kind of inconvenient to time these manually, isn't it? Fortunately, there's an easier way:
time insert r50k
real    0m3.904s
user    0m3.864s
sys     0m0.040s
(Your times will probably not be exactly the same, of course). The time command runs whatever you give it and keeps track of the time. Real time is total time (what you'd get by looking at the clock on the wall). User time is the time your program spent doing its own work. System time is how long your program spent waiting for the system to do things for it (like printing to the screen or reading from files).
Using time, make accurate measurements of how long insert takes to run. Run several different rand.pl outputs of the same size and verify that insert always takes about the same amount of time. Does it look like the time grows quadratically, as our analysis claims?
trans.pl works like rand.pl, but instead of producing random output, it starts with a sorted list of numbers, and then swaps a certain number of random pairs. For example, try trans.pl 20 2 and look at the output.
Now make a few big files using trans.pl with 5-10% transpositions. Compare the running times for insert on comparably sized files from rand.pl. The input distribution certainly matters!