CSE 341 -- Storage Management

It is useful to understand how storage is managed in different programming languages and for different kinds of data. Three important cases are:


Static Storage Allocation

Static storage allocation is appropriate when the storage requirements are known at compile time. For a compiled, linked language, the compiler can include the specific memory address for the variable or constant in the code it generates. (This may be adjusted by an offset at link time.) Examples:


Stack-Based Storage Allocation

Stack-based storage allocation is appropriate when the storage requirements are not known at compile time, but the requests obey a last-in, first-out discipline. Examples:

Stack-based allocation is normally used in C/C++, Ada, Algol, and Pascal for local variables in a procedure and for procedure call information. It allows for recursive procedures, and also allocates data only when the procedure or function has been called -- but is reasonably efficient at the same time.

Typically a pointer to the base of the current stack frame is held in a register, say R0. A reference to a local scalar variable can be compiled as a load of the contents of R0 plus a fixed offset. Note that this relies on the data having known size. To compile a reference to a dynamically sized object, e.g. an array, use indirection. The stack contains an array descriptor, of fixed size, at a known offset from the base of the current stack frame. The descriptor then contains the actual address of the array, in addition to bounds information.

References to non-local variables can be handled by several techniques -- the most common is using static links. This is beyond the scope of what we'll cover in 341 this year. Most variable references are either to local variables or global variables, so often compilers will handle global variable references more efficiently than references to arbitrary non-local variables. Scalar local variables (especially parameters) can be handled efficiently as they are often passed through registers.

There are two important limitations to pure stack-based storage allocation. First, the only way to return data from a procedure or function is to copy it -- if you try to simply return a reference to it, the storage for the date will have vanished after the procedure or function returns. This isn't an issue for scalar data (integers, floats, booleans), but is an issue for large objects such as arrays. For this reason you can't for example return a loclly declared array from a function in C. Second, you can't return a procedure or function as a value, or assign a procedure or function to a global variable (procedures or function values aren't first class citizens). The reason for this restriction is that procedure-valued variables need to be closures -- that is, they need to include a reference to the procedure body along with a pointer to the environment of definition of the procedure. With a stack-based storage allocation scheme, if we tried to assign a procedure to a global variable, environment of definition may have disappeared when we try to call the procedure.


Heap-Based Storage Allocation

The most flexible allocation scheme is heap-based allocation. Here, storage can be allocated and deallocated dynamically at arbitrary times during program execution. This will be more expensive than either static or stack-based allocation. Heap-based allocation is used ubiquitously in languages such as Lisp/Scheme and Smalltalk.

Issue: when is storage allocated and deallocated?

Allocation is easy. In C, malloc (a standard library function) allocates fresh storage. In Lisp/Scheme, a new cons cell is allocated when the cons function is called, array storage can be allocated using make-array, and so forth. In Smalltalk, new storage is allocated when someone sends the new message to a class.

Deallocation is harder. There are two approaches: programmer-controlled and automatic. In languages such as C, the programmer is in charge of deciding when heap storage can be freed (in C using the free function). This is efficient but can lead to problems if the programmer makes a mistake -- either storage is not freed even though it is no longer needed (memory leak), or is freed but referred to later (dangling pointer). Dangling pointers can lead to type insecurities or other errors -- one can refer to storage that has been re-allocated for another purpose with data of a different type. Some studies estimate that over 30% of development time on large C/C++ projects is related to storage management issues.

Lisp/Scheme and Smalltalk, as well as various other languages, use automatic storage management. There is no explicit deallocate function; rather, storage is automatically reclaimed some time after it is no longer accessible. We'll look at some of the most common techniques for garbage collection below:

Mark-Sweep

When memory is exhausted, traverse and mark all reachable memory. Then sweep through memory, freeing everything not marked. What is reachable? We define the concept of a root set as being the set of objects that are immediately reachable at the time of collection -- usually this includes objects referenced globally or in the call stack. Every object pointed to by a reachable object is reachable. Example:
(define fish '(shark bass pike))
(define foo '((7 8) (9 10)))
(define c '(1))
(set-cdr! c c)
Draw heap memory. Now evaluate the following:
(set-car! fish 'perch)
(set! c foo)
(set! foo (cdr foo))
Now perform the marking operation.

Issues: Every object must have a mark bit. We need to keep a little extra memory available for performing the collection. The runtime of mark-sweep is linear in the heap size. Why? Memory becomes fragmented.

Copying

Divide memory into old and new regions. When we are out of space in old, we traverse all reachable memory, and copy it into new. Then we swap old and new.

When we find an object in old, we copy it into the next slot in new. We then have to replace the object (in old) with a forwarding pointer. Why?

Using the example above, run the copying collector.

Issues: No mark bits. Copying can be slow for very large objects. Allocation is generally faster than mark-sweep, because free memory is contiguous, and the next free block is immediately available. The runtime is proportional to the amount of reachable memory. Need to allocate 2N memory for a heap of size N. Compaction is free.

Some memory-managers take a hybrid approach: placing large objects in a mark-sweep heap, and small ones in a copying-collected heap.

Generational

Observation: Some memory is used very briefly, other is used for a long time. Divide memory into regions: youngest, young, ..., old, older, old. Perform collections more frequenly in the younger regions of memory. If an object survives multiple collections, promote it to an older region.

Again, hybrids are often used: young generations might be collected using a copy-collector; old generations might be collected using a mark-sweep.

Incremental

A big problem with GC is that it interrupts the program. This pause could result in embarassing (during a demo) or disastrous (space shuttle) behavior. Most modern (eg. java) collectors are incremental -- they run as a separate thread (possibly on a separate processor) and collect memory "in the background." This strategy works well for I/O bound programs. Why?

Reference Counting

Reference counting adds a "count" field to every object. When an object is created, its count is set to 1. Every time a new pointer references an object, its count is incremented. Every time a pointer unreferences an object, its count is decremented. When the count reaches 0, an object can be reclaimed. Show reference-counting in action for the above example.

Issues: Reference counting adds overhead to every pointer operation. It is, howerver, inherently incremental. It cannot collect cyclic structures.

Activation Records in Scheme/Smalltalk

How are activation records (stack frames) allocated in languages such as Smalltalk? Since blocks hold onto their environment of creation and can be bound to global or instance variables, in general activation records can't be allocated on a stack. In addition, activation records are themselves full-fledged objects -- this capability is used for example by the debugger, and can be used to create new kinds of control regimes (e.g. coroutines). However, most activation records can be allocated on a stack (and don't need to be real objects) -- this is much faster than making an object for every stack frame and storing it in the heap. So efficient implementations allocate activations on the stack and only make them into objects when needed.

A Garbage Collector for C/C++

It's actually possible to write a garbage collector for C/C++ and people have done so. The main issue is that these collectors have to be "conservative." That is, they might encounter values (such as integers) that look like pointers and must treat them as such, for fear of corrupting the program state. This could result in leaks over time, but in practice, this rarely happens. Finally, such a collector must be mark-sweep in nature. Why?