In section, I showed two histograms displaying your distribution of ratios of column-major (j then i) runtime to row-major (i then j) runtime. I used ratios rather than absolute values both to make the results more comparable between your varied computers, and to better show you how much slower - and how much more variable in slowness - an apparently equivalent program can be thanks to the hardware.
Here are the same histograms below, modified to break out those ratios strictly less than 1 (indicating the column-major traversal was faster), and the ones between 30 and 35 (click to see larger PDF versions):
You shouldn't trust these numbers, though. Here are some reasons:
What did you find surprising? Here are just a few of the things we spotted in your solutions, or that we discussed in section:
We hoped that HW0 would show you that the hardware implementation can affect your program's performance and correctness in ways that are not obvious from your source code. From the wide variety of results you reported and your discussions of these results, it's clear that almost all of you got that point, though some of you were perhaps a bit too fixated on the idea of caches and hence missed the larger point.
If you run the original C code in the "related to that" point, you'd've found that gcc did make the original code run fast when optimizing (-O2), while after making the change we suggested to actually use the loop-computed value, the code ran slowly whether or not it was optimized. On the surface, this C program seems to exhibit the same optimization opportunity as the programs you timed earlier - a useless loop.
So why won't gcc or the Java VM optimize away the loop in the timed programs? This is a question that even we don't know the answer to. Here are some of your speculations:
For the case of gcc optimizing the hw0.c program, we speculated that
the structure of the program prevents gcc from realizing that the
array copy loop is useless. That is, we have defined the source and destination
arrays outside the main()
function, so these are visible to the
entire program. We think that the global accessibility of the arrays
forces gcc to assume that they may be modified by other threads (pieces
of the program executed simultaneously with main()
),
or even other programs. In this hypothesis, gcc simply doesn't realize
that, in fact, the src
and dst
arrays are modified
only by main, because it doesn't know when compiling the hw0.c
source code file that that file constitutes the entire program.
You've compiled and run C programs in homework 0, but probably you didn't do more than copy and paste the commands we gave you. Now it's time to look more closely at how to work with C on Linux.
In the past, there was a whole class that taught you everything you needed to know about Linux and C. Regrettably, that class no longer exists. Hence, it's likely you'll need to learn more about C and Linux than we teach you, either in section or as asides in lecture. If you'd like to learn more, I suggest reading the material for CSE 303, CSE 374, or the new seminar CSE 390-A.
The first program you ever wrote was probably the Hello world program. Here's a slightly expanded one in C:
/* hello.c */ #include <stdio.h> int main (void) { int n = 3; printf("Hello, world!\n"); printf("n = %d\n", n); n++; printf("Now n = %d\n", n); return (0); }
This main() function is not too far from the corresponding Java code for the
main() method, if you were to write it outside of a class. The main difference
is that instead of calling the println() method of the PrintStream referenced by
System.out, you call the printf() function, which is not stored in any object or
part of any class. Instead of getting printf() implicitly, you must ask for it
explicitly, by including the source code file stdio.h
where it is defined.
Also, you don't build up output by concatenating strings and
values, because C doesn't support catenating two strings together using built-in
operators, let alone a string and an int. Instead, you call printf() with
varying-length parameter lists, and strings with "format specifiers"
like %d
that say where to interpolate the extra parameters.
In general, everything that you can do with Java primitive values, you can do almost the same in C. (There are some differences in edge cases, such as arithmetic overflow, where Java precisely defines behavior that C leaves unspecified. However, in both C and Java the edge case behavior is almost never what you want, unless you prefer a program that produces the wrong answers to one that blows up.) But C does not have Java objects - C supports complex data structures called structs, but these cannot have methods or functions within them. Put in CSE 143 language, where Java combines state and behavior, C separates them.
Remember gcc? It's back! As you did in the homework, bring up the Linux terminal window, and then type:
$ emacs hello.c [type in the code, save and exit...] $ gcc -g -Wall hello.c $ ./a.out Hello, world! n = 3 Now n = 4 $
Some things to note here:
gcc
command takes a list of options marked by dashes,
and then the source code filenames. The -Wall options, which you saw in the
homework, means to generate most (not all) compiler warnings. The -g option
means to insert information in the output file to match certain sequences of
machine instructions to lines of code in the hello.c file. This is useful
when running the program under a debugger (next).gcc
writes the compiled program to a.out
. You can specify a
different program filename by giving the -o option, which is followed by the
output filename, for example: -o hello
.Like the programs you worked with in homework 0, this program does nothing useful. More interestingly, it is probably going to crash - it does on my machine:
/* crash.c */ #include <stdio.h> int main (void) { /* creates a 10-element array */ int ns[10]; int i; for (i = 0; i < 10; i--) { ns[i] = i; } return (0); }
Due to a slip of the fingers, we are decreasing the array subscript i, rather than increasing it. Unlike Java, C does not check that array subscripts are in bounds for the array; in fact, it is not even possible to compute how long an array is at runtime. We may never detect the error, but if we do, it will be Linux that detects the error and violently terminates your program, not C:
$ gcc -g -Wall crash.c -o crash $ ./crash Segmentation fault
Since we are too lazy to reread our code, we'll use the computer to help us
figure out what went wrong. This calls for a debugger - a tool that
lets you run the program one step at a time, handling crashes by keeping the
wounded program alive as long as you need to find out what killed it. Linux's
debugger is called gdb
. Let's run our program crash
under gdb:
$ gdb crash
GNU gdb (GDB) Fedora (7.1-34.fc13)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/crash...done.
(gdb)
Here are some things you can do with gdb (we'll see more later in the quarter):
You can run the list
command to show the source code that
corresponds to the original program.
(gdb) list
By default, list
will display the ten lines around the current
execution point in the program. In this case, we haven't started our program
yet, so we are at the beginning, in the main
function,
which is the program entry point.
1 #include <stdio.h> 2 3 int main (void) 4 { 5 /* creates a 10-element array */ 6 int ns[10]; 7 int i; 8 9 for (i = 0; i < 10; i--) { 10 ns[i] = i;
Let's start running the program, since we can't really inspect its state before we do that.
(gdb) start
Temporary breakpoint 1 at 0x804839a: file crash.c, line 9.
Starting program: /tmp/crash
Temporary breakpoint 1, main () at crash.c:9
9 for (i = 0; i < 10; i--) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12.2-1.i686
(gdb)
As you can see, gdb sets an automatic (temporary) breakpoint just before the
first executable line in the main
function. A breakpoint interrupts
normal program execution, freezing it in time, so you can inspect the internal
state of your program, dissecting it on your laboratory table (don't worry,
programs can't feel pain, and you put it all back together before anyone
notices!). Note that when your program reaches a breakpoint, gdb shows
the next line to be executed after resuming from the breakpoint.
If you don't need to stop before running
main
, then use run
instead of start
.
Let's step through the program line by line.
(gdb) step
This will execute the current line, then show the next executable line:
10 ns[i] = i;
When you call a function, step
will jump into that function's
source code. This is not so good when you end up looking at code you didn't
write, such as printf(). For that, you
can use next
, which treats function calls as a single operation.
Our program hasn't
crashed yet - use cont
to run it until it finishes normally or crashes:
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x080483a9 in main () at crash.c:10
10 ns[i] = i;
(gdb)
The program crashes just as before, but because it is now running under gdb,
Linux simply suspends the program and hands control back to gdb just as if
it had hit a breakpoint. Now you can diagnose what went wrong by examining
the crashing program's state using print
.
One of the most useful gdb commands is print
, which lets you
print out different expressions, including (but certainly not only!) source
code variables. In fact, print
can print an almost
arbitrary C expression.
Now that our program has crashed, let's use print
to find out why:
(gdb) print ns[i]
Cannot access memory at address 0xbf7ffffc
(gdb)
Well, that's no good - Linux won't let us access the array element ns[i]
.
(This is what Segmentation fault
means - Linux will not let you
access that point in memory.)
But why can't we access that region of memory? Let's take a look at our local variables
- maybe ns
or i
is funky. For that we use info locals
:
(gdb) info locals
ns = {0, -1208217120, 134513173, -1208206112, -1208209420, 134513616, 134513376, 134513627, -1208209420, 134513616}
i = -2096336
(gdb)
i
shouldn't be negative! Let's use list
again
to check that we didn't botch the update of i
somehow:
(gdb) list
5 /* creates a 10-element array */
6 int ns[10];
7 int i;
8
9 for (i = 0; i < 10; i--) {
10 ns[i] = i;
11 }
12
13 return (0);
14 }
(gdb)
Oops, I guess we did.
Print
has a lot of features. To see what they are and how to use them,
use help
:
(gdb) help print
Print value of expression EXP.
Variables accessible are those of the lexical environment of the selected
stack frame, plus all those whose scope is global or an entire file.
$NUM gets previous value number NUM. $ and $$ are the last two values.
$$NUM refers to NUM'th value back from the last one.
Names starting with $ refer to registers (with the values they would have
if the program were to return to the stack frame now selected, restoring
all registers saved by frames farther in) or else to debugger
"convenience" variables (any such name not a known register).
Use assignment expressions to give values to convenience variables.
{TYPE}ADREXP refers to a datum of data type TYPE, located at address ADREXP.
@ is a binary operator for treating consecutive data objects
anywhere in memory as an array. FOO@NUM gives an array whose first
element is FOO, whose second element is stored in the space following
where FOO is stored, etc. FOO must be an expression whose value
resides in memory.
EXP may be preceded with /FMT, where FMT is a format letter
but no count or size letter (see "x" command).
Typing help
by itself will show you a list of more general
topics you can use to navigate down to a specific command, if you don't know
exactly what you are looking for.
You can press the up and down arrow keys to go backwards and forwards through your command history. This may save you typing if you want to reuse or slightly modify a previous command.