CSE333 Notes for Monday, 10/31/11

I mentioned that we are basically done with our coverage of C++ and now we will be examining C for a few weeks. So instead of using the g++ compiler, we are going to start using the gcc compiler.

I began with the classic "hello world" program in C:

    #include <stdio.h>
    
    int main() {
        printf("hello world\n");
        return 0;
    }

The system libraries included in C programs use the ".h" notation instead of the syntax we've gotten used to in C++. The "stdio" library is the standard I/O library in C that defines functions like printf.

I assume that people have a basic familiarity with C because of cse351 and because of our discussions of C++. I began with a simple swap function in C:

    void swap(int x, int y) {
        int temp = x;
        x = y;
        y = temp;
    }

We then wrote this client code:

        int x = 3;
        int y = 4;
        printf("x = %d, y = %d\n", x, y);
        swap(x, y);
        printf("x = %d, y = %d\n", x, y);

Not surprisingly, it didn't swap the values. More precisely, it swapped local copies of the values, but it had no effect on the variables we passed as parameters. That's because the parameter passing mechanism in C is a value parameter mechanism.

In C++, we could simply add an ampersand to the type to make these references to int variables. We can't do that in C. Instead we use pointers and the address-of operator. In effect, C++ does the pointer and address manipulation for us when it sets up a reference type. But in C we have to do it all ourselves. So our swap function became:

    void swap(int * x, int * y) {
        int temp = *x;
        *x = *y;
        *y = temp;
    }

and our call became:

    swap(&x, &y);

This properly swapped the values.

Then we looked at this short sample program that I have borrowed from Steve Gribble's 333 class:

    #include <stdio.h>
    
    int main(int argc, char **argv) {
        printf("sumTo(5) is: %d\n", sumTo(5));
        return 0;
    }
    
    // sum integers from 1 to max
    int sumTo(int max) {
        int i, sum = 0;
      
        for (i=1; i<=max; i++) {
            sum += i;
        }
        return sum;
    }

I asked people what was wrong with the program. Nobody was able to figure it out. Perhaps that is because I have been careful to always define functions before they are used. This program calls a function sumTo in function main before it is defined. Remember that the C compiler reads the file from beginning to end and normally it would be upset to see a programmer referring to something that hasn't yet been defined.

We threw in two more calls on the function with different types of values:

    printf("sumTo(5) is: %d\n", sumTo(5));
    printf("sumTo('a') is: %d\n", sumTo('a'));
    printf("sumTo(384.5) is: %d\n", sumTo(384.5));

This still compiled without any warnings. It produced the following output:

    sumTo(5) is: 15
    sumTo('a') is: 4753
    sumTo(384.5) is: 0

The first call produces expected results with an int. The second is treating the character constant as an int. The third is taking part of a double and interpreting as an int, which led it to produce a zero result.

Then we changed the return type to double and it finally produced an error:

    double sumTo(int max) {

The compiler complained of a conflicting type for sumTo and mentioned that it had used an implicit declaration. In C, if you don't specify the return type of a function, C will simply make one up for you. It will assume that you meant to declare it to have a return type of int. That's why changing the return type to "double" is a conflict.

We can fix this by introducing a function prototype. So I changed the return type back to int and added a prototype before main:

    int sumTo(int max);

Oddly enough, this changed the behavior of the call that involved a double. Now it produced the following output:

    sumTo(384.5) is: 73920

So now that it knows that the first parameter is an int, it does a conversion of the double to get the integer part of it (384). Before, it was simply grabbing part of the double and interpreting it as an int. So the compiler is not making assumptions about the type of the parameter.

In fact, the situation with parameters is much, much worse. We looked at this function declaration:

    int foo(int a, int b, int c) {
        int sum = a + b + c;
    }

As with the sumTo function, we declared it after main. We then wrote this code to call it and print the result:

    int n = foo(2, 4, 6);
    printf("n = %d\n", n);

We got an odd answer. It printed this:

    n = 1851726288

The reason is that we never included a return value. And C didn't care. It simply grabbed part of memory and interpreted it as our return value. When we added a return:

    int foo(int a, int b, int c) {
        int sum = a + b + c;
        return sum;
    }

it printed the expected result:

    n = 12

But that didn't mean that we were done with odd things happening with this function. We added a line of code to print the sum before returning it:

    int foo(int a, int b, int c) {
        int sum = a + b + c;
        printf("sum = %d\n", sum);
        return sum;
    }

and made the following calls in main:

    foo(1, 2, 3, 4, 5);
    foo(1, 2);
    foo();

The C compiler was perfectly happy to let us pass the wrong number of parameters. It printed the following lines of output:

    sum = 6
    sum = 5
    sum = 1714930120

When we passed too many values, C simply ignored the extra values. And when we passed too few, it grabbed other parts of memory to use for those values. I asked why the final sum is such a large number and someone mentioned that it's probably a pointer.

The news wasn't all bad for the compiler. When we included the following prototype before main:

    int foo(int a, int b, int c);

the compiler produced error messages for our calls with the wrong number of parameters, which is good.

The moral of the story is that C can be very forgiving. I mentioned that I like a description I heard from a famous computer scientist named Brian Reid. He said that C is his favorite high-level assembly language. By that he meant that it is possible to do very low-level operations in C, but it is also possible to write for loops and if/else constructs and other things we expect from a high-level language.

The fact that C allows us to make so many mistakes led to a famous program called lint that looked for problematic lines of code in your C programs. The "lint" idea has since been used for many other programs.

C itself has evolved over time and gives better warnings and errors now than it used to. For the gnu compiler, you can include the following compiler flag to get extra warnings:

    gcc -Wall mon.c

We then spent some time reviewing a set of slides that Steve Gribble prepared for his offering of cse333 that have the exact pictures that I would hope students have in their head when they are thinking about the execution of a C program. This material was discussed in cse351, so I hope this is review. The slides can be found here.

Steve's slides include an example where we print the address of main. I asked people how we could figure out the contents of that particular memory location. Suppose, for example, that we wanted to see the first two bytes. We wrote this code to do so:

    char * p = (char *) &main;
    printf("p = %x\n", *p);
    p++;
    printf("p = %x\n", *p);

It produced the following output:

    p = 55
    p = 48

I asked what would happen if we tried to change one of these bytes, as in:

    *p *= 2;
    printf("ha ha\n");

When we ran that program, it crashed with a bus error. That's because we were trying to change a part of memory that is flagged as read-only.

I ended the lecture with a puzzle. I mentioned that our department was using C as the language for cse142 in the late 1990's. I think C is a particularly bad choice of language for intro and I wanted to discuss a specific example of something that could go wrong.

I gave a hypothetical situation. Suppose that you are a TA working for a version of cse142 taught in C. A student has told you that he figured out how to write his program without all of that confusing parameter passing and return values that the instructor was talking about. And he wants to know if he can't use this simpler approach.

I asked people to consider what the student might be doing. For example, perhaps the student has all of his code in main. I said that's not the case. The program is broken up into several functions.

Someone asked if there were global variables and I asked what those are. The answer is no. Someone talked about curious manipulations of the stack, but the answer is no (sort of). Someone mentioned luck and I said that factors into it somewhat. I mentioned that a student could use an array and pass it as a parameter with several things in it, but even that isn't what's going on.

Here is the mystery program:

    #include <stdio.h>
    
    void f1() {
        int x;
        double y;
        x = 15;
        y = 38.9;
        printf("%d %f\n", x, y);
    }
    
    void f2() {
        int x;
        double y;
        printf("%d %f\n", x, y);
        x++;
        y *= 2;
        printf("%d %f\n", x, y);
    }
    
    void f3() {
        int a;
        double b;
        printf("%d %f\n", a, b);
    }
    
    int main() {
        f1();
        f2();
        f3();
        return 0;
    }

It produces the following output:

    15 38.900000
    15 38.900000
    16 77.800000
    16 77.800000

So this program manages to successfully set two variables in one function, then change them in another, and then see the changed values in a third. The trick is that the student always uses exactly the same local variables declared in the same order. So in effect, this student has introduced global variables that are stack-allocated. Each call brings back the old variables with the old values.

Of course, this is very fragile. If you change the order of the local variables or introduce a third local variable, it can all go wrong. And if you add another line of code to main, like a print statement, it also stops working, as in:

    f1();
    printf("hello world\n");
    f2();
    f3();

Because printf is a function, calling it introduces a stack frame that erases the old local variables from the f1 function, so the call on f2 is using different values.

Welcome to the wonderful world of C programming.

Stuart Reges

Last modified: Sun Nov 13 18:26:33 PST 2011