CSE 374, Lecture 13: The Heap

Structs for a linked list

Remember linked lists from our intro to CS courses? A linked list is a way to store a list of data; unlike an array, because elements are organized via "next" pointers and not by contiguous blocks of memory, they are extendable if you want to increase the amount of data that is stored.

A linked list consists of a collection of NODES, each of which consists of some data and a pointer to the next node. You could draw a picture of a linked list called "list" that stores the values [1, 2, 3] as follows:

                 -------------        -------------        -------------
      ---       | data | next |      | data | next |      | data | next |
list | .-|--->  |  1   |   .--|--->  |  2   |   .--|--->  |  3   |   /  |
      ---        -------------        -------------        -------------

How would we build a linked list in C? What would a node look like? We can use structs to define a node! Let's do this for a node that stores an integer as data.

    struct IntListNode {
      int data;
      struct IntListNode* next;
    };

Notice how we have created a "recursive" struct: the IntListNode contains field that is a pointer to another IntListNode.

(Side note: you can create recursive structures that contain POINTERS to the same type, but you cannot create a recursive structure that actually contains a full element of the same type (in this case, "struct IntListNode next"). This is because structs are contiguous blocks of memory, and in a purely recursive structure, the compiler doesn't know how much memory to provide the struct because it could contain infinite memory (due to its recursive nature). So recursive structures like list nodes always store POINTERS and not the raw type.)

How would we use our struct to create a list that stores [1, 2, 3]? The easiest way to do this is to work backward, creating a node that stores 3, then creating a node that stores 2 and pointing its "next" pointer to the 3, then creating a node that stores 1 and pointing its "next" pointer to the 2. Here's our first attempt; draw out the nodes as the loop executes to see how it builds up the list.

    struct IntListNode* makeList() {
      struct IntListNode* front = NULL;
      for (int i = 3; i > 0; i--) {
        struct IntListNode n;
        n.data = i;
        n.next = front;
        front = &n;
      }
      return front;
    }

THIS IS WRONG! Why? Remember that when we declare a local variable, it exists for a certain SCOPE which is defined by the curly braces that enclose it. So for each iteration of the loop, the node that is created will be destroyed after that iteration is done! This means that the "front" pointer will point to something that no longer exists, which is an incorrect thing to do. Similarly, we cannot return the pointer to the front of the list because the nodes themselves were destroyed when they went out of scope in the loop!

What we want is a way to allocate memory for structs and variables that doesn't go away when we exit the enclosing curly braces/scope. Enter the heap!

The heap

Local variables like the ones we have been using are allocated on the STACK which stores data and code for each function that is executing. The stack is limited in two ways:

The "heap" is a different region of the address space that we can use for data that doesn't have either of these limitations. We can call a function to reserve a chunk of memory, and that memory will be valid until we explicitly call another function to release it.

malloc: allocating heap space

To reserve a chunk of memory, we can use the function "malloc", which is a part of the C standard library (in ). It takes a number and allocates at least that many bytes on the heap. It returns a pointer to the newly-reserved memory, or NULL if the memory could not be reserved for some reason (if you have run out of space in memory, for example).

    // Allocates "size" bytes of uninitialized storage.
    // If allocation succeeds, returns a pointer to the first byte.
    // If allocation fails, returns NULL pointer.
    void* malloc(size_t size);

Gotchas:

Summary: Use malloc to allocate space for n elements of type T. This memory will be reserved until it is released.

    T* x = (T*) malloc(n * sizeof(T));

free: deallocating heap space

We can now allocate memory of any size and have it live forever; we could create an array and use it indefinitely. Unfortunately, computers do NOT have infinite memory, so in a long-running program "living forever" could be a problem.

In Java, any object that is no longer referenced anywhere is "garbage collected" and deallocated automatically. In C, unfortunately, you have to explicitly deallocate all memory that you allocate with "malloc". In complex C programs, freeing memory correctly is VERY HARD, and this is one large disadvantage of low-level C programming as opposed to Java.

The way we release a chunk of memory that was allocated with malloc is with the standard library function "free", which accepts a pointer to an address that was allocated with malloc.

    int* p = (int*) malloc(sizeof(int));
    free(p); // memory for p is now released

If you lose the pointer to memory that was allocated with malloc, we say your program has a leak because you have no way to call free on it! This is bad and a bug in your program.

    int* p = (int*) malloc(sizeof(int));
    p = NULL; /* LEAK! */
    // What address was that int at? We have no way to know.

free must be called once and no more than once. If you call "free" twice on the same pointer, your program may crash.

    int* p = (int*) malloc(sizeof(int));
    free(p);
    free(p);  // Hope you crash! But who knows, maybe not

If you try to use a pointer that has already been freed, it is just like using a "dangling pointer" to a local variable - your program may crash but it may also do bad things to memory without crashing.

    int* p = (int*) malloc(sizeof(int));
    free(p);
    int* r = (int*) malloc(sizeof(int));
    *r = 19;
    *p = 17;  // Hope you crash, but maybe *r==17 ?!

Problems with free when you are using functions:

Rules for malloc and free

  1. For every run-time call to malloc there should be one runtime call to free.
  2. If you "lose all pointers" to an object, you can't ever call free (a leak)!
  3. If you "use an object after its freed" (or free it twice), you "used a dangling pointer"!
  4. It's possible but rare to use up too much memory without creating "leaks via no more pointers to an object".

Valgrind

Ideally your program never has any memory leaks or dangling pointers, but in reality it is important to verify that your program is correct. We have a tool called "valgrind" which will find pointer errors and memory leaks during execution of the program.

To run valgrind on the command line, give it the name of the program that you want to run (along with any arguments that you want to provide to the program). Use the option --leak-check=full for a more complete summary of where any leaks occurred.

    $ gcc -o myprogram myprogram.c
    $ valgrind --leak-check=full myprogram firstarg secondarg

C vs Java

In Java, when you create a new object with "new T()", Java does a bunch of things all at once:

As we have seen, in C, these steps are almost all separated and you have to do them manually.

Linked List

We built up our linked list example using what we've learned about malloc and free. Our implementation (linked from the course web page) has the following features: