CSE 451 - Recitation 1: Intro to C

Introduction:

TA information:
Aaron Kimball -- ak at cs dot washington dot edu -- office hour: Monday 1:30 - 2:30, CSE 216
William Rossiter -- rossiter at cs -- office hours: W 11:30-12:30, Th 1:30-2:30, CSE 022 (undergrad lab)

Course information:
Website: www.cs.washington.edu/education/courses/451/06au/
Grades will be assigned roughly as follows:

Projects: 40%
Quizzes 15% (lowest quiz score dropped)
Midterm: 20%
Final: 25%

Expectations: collaboration okay -- copying isn't. Use your best judgement, don't take notes at collaborative meetings with other students. Always let us know whom you're working with.

TO DO:

Get on the mailing list, if you haven't already done so. Go to mailman.cs.washington.edu and sign up for the "CSE451" list. Do this today.
Email me (ak at cs) with your project group partners -- teams of 3. Do this by Wednesday, or I'll randomly assign you partners.

C vs. Java

This lesson assumes that you already know how to program in Java. Some proficiency with C/C++ might be useful too; I'm just going to highlight the key differences between C and Java

Motivation: Why C?

Why couldn't we write an entire OS in, say, Java? Java is interpretted -- the .class files need to be run using the Java VM. What would the Java VM itself run on? This needs to be native code.
OS needs precise control -- programmer needs to be able to "run the OS in his head"
Things like garbage collection are unpredictable -- they can happen at any time. This is unacceptable inside the OS. Some tasks may run in modes which necessarily must be completed very quickly, such as when interrupts are disabled (you'll learn about this more, later).
Memory management inside the OS is much more constrained; you typically have multiple "pools" of memory from which to allocate objects -- a single simple "new" operator no longer cuts it.
ask: any other reasons?
Plus, since Linux is already written entirely in C (except for a few architecture-specific bits in assembly), any modifications we make to this must be in C as well.
C was invented at Bell Labs specifically for OS Programming (UNIX) in 1977. Its design goals were to be more portable and readable than assembly, but to compile very simply to assembly in a straightforward manner. (We're still working on the portability bit, even in 2006!)

Why not C++? Still too much overhead; inherited classes and things like virtual dispatch make debugging hard.

ask: Who knows Java? C++? C? (assembly?)

Differences in compilation:
Java: .java files => .class files, which are run through a VM. Can be "zipped" together in a .jar, but still functionally a set of separate files.
C: .c files => .o files; header files used to share definitions. .o files are linked together into libraries and/or executables.

(Demonstrate syntax for gcc with one .c->exe, .c->.o and multiple .o files, libraries...)

Differences in libraries:
In Java, you had the Class Libraries (java.util, java.String, java.Math, etc..). You would import a namespace such as java.util.* into the active file. This let the VM know that names in this file referred to names from that namespace.
In C, you have a set of standard functions rolled together in "libc." There is no concept of namespaces. To use a function that isn't a part of the current file, you need to predefine it as extern. These definitions or prototypes are in a set of header files (discuss concept of header files) which you can include.

The linker will then match up all the library functions used with your uses of them in the final compilation step.
To include one file inside another, we use a preprocessor directive, of the form:

#include<filename.h>

(The file does not necessarily need the .h extension; it is by convention only.)
There is no preprocessor in Java. In C, the preprocessor reads the files, looking for lines beginning with '#', and then performs an appropriate action, effectively rewriting the file, before the compiler proper actually looks at it. An include directive causes the contents of the included-file to be inserted verbatim "on top" of the include directive. This can cause recursively nested file inclusion. The compiler will warn you if there's a loop.

The biggest difference: There are no classes in C!

All methods are "static" in the Java sense -- but don't declare them with the static keyword, unless you know exactly what you're doing; it means something else in C.
data structures are stored in structs -- which are like classes with all members public, and no functions.
You must manipulate these with methods "outside" of the structure.

(Explain syntax for declaring structures with the "struct" keyword.)

In Java, all objects are stored in the heap (explain difference between heap and stack). In C, structs may live in the heap, on the stack, or in global space. If you allocate a struct in the heap, you will receive a pointer to the struct. Effectively, this is the same as a reference in Java -- it contains the address of the struct, in memory.

In Java, code such as:


class Foo {
   public int x;
   public int y;
}

...

Foo f = new Foo();
f.x = 42;
f.y = 451;

meant: create a new Foo object in the heap, let 'f' hold the address of this object. the dot operator (".") meant dereference 'f' with a member, and assign a value there.

In C, we would write:


struct Foo {
   int x; //everything's already "public"; the keyword doesn't exist in C.
   int y;
}; //don't forget the ';' here.

...

struct Foo *f;
f = malloc(sizeof(Foo));
f->x = 42;
f->y = 451;

The * before f in the first line is important -- it means that f is a pointer to a struct Foo. To dereference with a member, we then use the arrow operator. (Explain malloc vs new, malloc/free/kmalloc/kfree, sizeof)

Alternately, we could allocate the struct right inside the current stack frame:


struct Foo f;
f.x = 42;
f.y = 451;

The dot operator means "use the member of this object" without dereferencing. So in C, you can manipulate the "object itself" or "the object through a pointer", and must use a different operator for each.

You can also declare structs globally; these are analogous to public static members of a class in Java, but a global "Foo f" means a Foo object itself -- not a reference to it. Use the dot operator, not an arrow to get at its members.

ask: What restrictions exist on stack-allocated structs? Can you return a stack allocated struct? What about a pointer to it?

Call-by-value vs. Call-by-reference
In Java, when you sent arguments into a function, integers and other scalars were passed "By value," whereas arrays and objects were passed "By reference." What does this mean?
Call-by-value means that the function receiving the value is receiving a copy of it. Consider:


int double(int x)
{
	x = x * 2; //is not modifying the 'x' below -- is modifying its own x.
	return x;
}

int foo()
{
	int x = 42;
	return double(double(x)) + x;	
}

Call-by-reference, on the other hand, means that you have received the address of the argument; any modifications you make to the argument are reflected back in the caller. Consider:


class Foo
{
	public int x;
}

int doubleAFoo(Foo f)
{
	f.x = f.x * 2; //actually modifies the underlying Foo 'f', below.
}

int bar()
{
	Foo f = new f();
	f.x = 42;
	doubleAFoo(f);
	doubleAFoo(f);
	return f.x;
}

In C, you can pass scalars (ints, chars, etc) as well as structs by value or by reference. You can pass a "struct" itself as an argument, in which case a copy of the entire struct is made on the stack and passed in to the callee (ask: are there performance issues associated with this?). You can also pass a pointer to a struct, as well as pointers to ints, etc.

You can declare an int pointer, by using the same '*' in the definition:
int *p;
This can be used in several ways.
To assign to the pointer: p = &x; <-- the & operator takes the address of x.
To assign to the pointed-to object: *p = 42;
To retrieve the value pointed to by p: printf("%i\n", *p);
If p was a pointer to a struct, you'd access its members with p->x

Thus, if you have a method that modifies an object (such as, say, a "queue" struct), it must receive a pointer to the queue. e.g., int dequeue(struct queue *q)).
You then call x = dequeue(&my_queue);, instead of x = my_queue.dequeue() like in Java.

Some final differences between structs and classes:

There is no constructor
There is no default value -- they aren't automatically zeroed.
Therefore, you must always call some other function to initialize your structure, whenever you create one. Very often, we use memset() for this.

Types and Typedefs

You may want to create new scalar types to keep track of certain values. For example, within the kernel, each process is referred to by a process identifier, or "pid". Thus, we'd like a type to describe pids. We define this with:

typedef int pid_t;

This defines a type, pid_t, which is stored using the underlying type "int." A caveat: all this does in C's mind, is make an alias between pid_t and int. It won't complain loudly if you use a value declared as pid_t where you should've used a regular integer; similarly, any other integer value can be used in place of a pid_t.

Typedefs also allow you to make a small shorthand change:


struct foo_s {
    int x;
    int y;
};

typedef struct foo_s foo;

...

struct foo_s some_foo;
foo another_foo; //Don't need to keep retyping "struct" everywhere.

Strings

In Java, String was a type all of itself. C still understands "double quoted strings" like that, but thinks of them as arrays of characters. Thus, you must define a "char *" to hold a heap-allocated string, or a fixed-size array such as "char foo[64];" to hold one on the stack or in global memory. If you're immediately assigning a value to it, you do not need to describe the size of the array; char foo[] = "Hello!" works. All strings must be NULL-terminated! If they are not, buffer overflows may happen. (Improper string handling is one of the leading causes of security breaches in C programs.)
I won't really say much more about these, because we won't be dealing with strings much in this class... ask me in office hours if you're really interested in knowing more.

Some other C pitfalls:

Very loose typechecking! There's no concept of ClassCastException (or any exception for that matter).
If you try to dereference NULL pointers (note: NULL, not null), instead of getting a NullReferenceException, your program will just crash or act weird. (SIGSEGV)
Trying to use a stack* as a queue* will be just fine to the compiler; it may give you a warning, it may not -- code carefully!
The type void* means "anything". Pointers or other values which are "washed" through a void* can be of any type. So be sure to keep track yourself of what you're putting where.
Even the number of arguments you pass to a method is not very strictly checked; this is allowed to happen either silently, or possibly with a warning. Either way, your program might crash!

Final thoughts: What's the same?

ints
algebraic operations and functions purely operating on numbers.
most of the basic syntax