CSE 451 - Recitation 1: Intro to C


Introduction:

TA information:
Aaron Kimball -- ak at cs dot washington dot edu -- office hours Monday 2:30 - 3:30 pm, CSE 218
Kurtis Heimerl -- kheimerl at cs -- office hours TBA

Course information:
Website: www.cs.washington.edu/education/courses/451/07wi/
Expectations: collaboration okay -- copying isn't. Use your best judgement, don't take notes at collaborative meetings with other students. Always let us know whom you're working with.

TO DO:

C vs. Java

This lesson assumes that you already know how to program in Java. Some proficiency with C/C++ might be useful too; I'm just going to highlight the key differences between C and Java

Motivation: Why C?

Why not C++? Still too much overhead; inherited classes and things like virtual dispatch make debugging hard.

ask: Who knows Java? C++? C? (assembly?)

Differences in compilation:
Java: .java files => .class files, which are run through a VM. Can be "zipped" together in a .jar, but still functionally a set of separate files.
C: .c files => .o files; header files used to share definitions. .o files are linked together into libraries and/or executables.

(Demonstrate syntax for gcc with one .c->exe, .c->.o and multiple .o files, libraries...)

Differences in libraries:
In Java, you had the Class Libraries (java.util, java.String, java.Math, etc..). You would import a namespace such as java.util.* into the active file. This let the VM know that names in this file referred to names from that namespace.
In C, you have a set of standard functions rolled together in "libc." There is no concept of namespaces. To use a function that isn't a part of the current file, you need to predefine it as extern. These definitions or prototypes are in a set of header files (discuss concept of header files) which you can include.

The linker will then match up all the library functions used with your uses of them in the final compilation step.
To include one file inside another, we use a preprocessor directive, of the form:

#include<filename.h>
(The file does not necessarily need the .h extension; it is by convention only.)
There is no preprocessor in Java. In C, the preprocessor reads the files, looking for lines beginning with '#', and then performs an appropriate action, effectively rewriting the file, before the compiler proper actually looks at it. An include directive causes the contents of the included-file to be inserted verbatim "on top" of the include directive. This can cause recursively nested file inclusion. The compiler will warn you if there's a loop.

The biggest difference: There are no classes in C!

(Explain syntax for declaring structures with the "struct" keyword.)

In Java, all objects are stored in the heap (explain difference between heap and stack). In C, structs may live in the heap, on the stack, or in global space. If you allocate a struct in the heap, you will receive a pointer to the struct. Effectively, this is the same as a reference in Java -- it contains the address of the struct, in memory.

In Java, code such as:


class Foo {
   public int x;
   public int y;
}

...

Foo f = new Foo();
f.x = 42;
f.y = 451;
meant: create a new Foo object in the heap, let 'f' hold the address of this object. the dot operator (".") meant dereference 'f' with a member, and assign a value there.

In C, we would write:


struct Foo {
   int x; //everything's already "public"; the keyword doesn't exist in C.
   int y;
}; //don't forget the ';' here.

...

struct Foo *f;
f = malloc(sizeof(Foo));
f->x = 42;
f->y = 451;
The * before f in the first line is important -- it means that f is a pointer to a struct Foo. To dereference with a member, we then use the arrow operator. (Explain malloc vs new, malloc/free/kmalloc/kfree, sizeof)

Alternately, we could allocate the struct right inside the current stack frame:


struct Foo f;
f.x = 42;
f.y = 451;
The dot operator means "use the member of this object" without dereferencing. So in C, you can manipulate the "object itself" or "the object through a pointer", and must use a different operator for each.

You can also declare structs globally; these are analogous to public static members of a class in Java, but a global "Foo f" means a Foo object itself -- not a reference to it. Use the dot operator, not an arrow to get at its members.

ask: What restrictions exist on stack-allocated structs? Can you return a stack allocated struct? What about a pointer to it?

Call-by-value vs. Call-by-reference
In Java, when you sent arguments into a function, integers and other scalars were passed "By value," whereas arrays and objects were passed "By reference." What does this mean?
Call-by-value means that the function receiving the value is receiving a copy of it. Consider:


int double(int x)
{
	x = x * 2; //is not modifying the 'x' below -- is modifying its own x.
	return x;
}

int foo()
{
	int x = 42;
	return double(double(x)) + x;	
}

Call-by-reference, on the other hand, means that you have received the address of the argument; any modifications you make to the argument are reflected back in the caller. Consider:


class Foo
{
	public int x;
}

int doubleAFoo(Foo f)
{
	f.x = f.x * 2; //actually modifies the underlying Foo 'f', below.
}

int bar()
{
	Foo f = new f();
	f.x = 42;
	doubleAFoo(f);
	doubleAFoo(f);
	return f.x;
}

In C, you can pass scalars (ints, chars, etc) as well as structs by value or by reference. You can pass a "struct" itself as an argument, in which case a copy of the entire struct is made on the stack and passed in to the callee (ask: are there performance issues associated with this?). You can also pass a pointer to a struct, as well as pointers to ints, etc.

You can declare an int pointer, by using the same '*' in the definition:
int *p;
This can be used in several ways.
To assign to the pointer: p = &x; <-- the & operator takes the address of x.
To assign to the pointed-to object: *p = 42;
To retrieve the value pointed to by p: printf("%i\n", *p);
If p was a pointer to a struct, you'd access its members with p->x

Thus, if you have a method that modifies an object (such as, say, a "queue" struct), it must receive a pointer to the queue. e.g., int dequeue(struct queue *q)).
You then call x = dequeue(&my_queue);, instead of x = my_queue.dequeue() like in Java.

Some final differences between structs and classes:

Types and Typedefs

You may want to create new scalar types to keep track of certain values. For example, within the kernel, each process is referred to by a process identifier, or "pid". Thus, we'd like a type to describe pids. We define this with:

typedef int pid_t;
This defines a type, pid_t, which is stored using the underlying type "int." A caveat: all this does in C's mind, is make an alias between pid_t and int. It won't complain loudly if you use a value declared as pid_t where you should've used a regular integer; similarly, any other integer value can be used in place of a pid_t.

Typedefs also allow you to make a small shorthand change:


struct foo_s {
    int x;
    int y;
};

typedef struct foo_s foo;

...

struct foo_s some_foo;
foo another_foo; //Don't need to keep retyping "struct" everywhere.

Strings

In Java, String was a type all of itself. C still understands "double quoted strings" like that, but thinks of them as arrays of characters. Thus, you must define a "char *" to hold a heap-allocated string, or a fixed-size array such as "char foo[64];" to hold one on the stack or in global memory. If you're immediately assigning a value to it, you do not need to describe the size of the array; char foo[] = "Hello!" works. All strings must be NULL-terminated! If they are not, buffer overflows may happen. (Improper string handling is one of the leading causes of security breaches in C programs.)
I won't really say much more about these, because we won't be dealing with strings much in this class... ask me in office hours if you're really interested in knowing more.

Some other C pitfalls:

Final thoughts: What's the same?

A quote from The Tao of Programming:

There was once a programmer who was attached to the court of the warlord of Wu. The warlord asked the programmer: ``Which is easier to design: an accounting package or an operating system?''

``An operating system,'' replied the programmer.

The warlord uttered an exclamation of disbelief. ``Surely an accounting package is trivial next to the complexity of an operating system,'' he said.

``Not so,'' said the programmer, ``when designing an accounting package, the programmer operates as a mediator between people having different ideas: how it must operate, how its reports must appear, and how it must conform to the tax laws. By contrast, an operating system is not limited by outside appearances. When designing an operating system, the programmer seeks the simplest harmony between machine and ideas. This is why an operating system is easier to design.''

The warlord of Wu nodded and smiled. ``That is all good and well, but which is easier to debug?''

The programmer made no reply.