[ ^ CSE 341, Winter 2004 home page | Lectures index ]

CSE 341: Mutable data, supplementary notes

Why side effects?

Previously, we have studied a subset of ML that is purely functional --- i.e. not having assignments, updatable data structures, or other side effects (a "side effect" is anything that is not just evaluation). This functional subset is computationally complete --- any computation can be expressed in it. So why would we even want to add mutable (updatable) data?

At first, one might respond: Well, to model certain processes accurately, we need side effects. The world, for example, changes over time. However, this is something of a fallacy --- we can always model changes over time using a function that takes the old state, and produces the new state; i.e., that has type:

World -> World

where World is some data type that represents the entire state of the world.

So why would we want to introduce side effects into our language? Here's a short list, with elaboration below:

Efficiency
Expressiveness
Permissiveness
Abstraction and ease of evolution
Interaction with the outside world

The upshot of the explanations that follow is that side effects (including mutable data) are currently a necessary evil in a practical language.

Efficiency

Logically, most functional programs make many copies of data. You're always constructing "the new world", e.g. the new list value that's returned from reverse. Therefore:

Naively compiled functional programs make many copies of data --- with corresponding costs in time and space.
A sophisticated compiler can eliminate many of those copies; but there are many caveats:
- Sophisticated compilers are difficult to construct.
- Such compilers may compile programs very slowly (sophisticated program analysis can take a long time) and require lots of memory during compilation.
- Such compilers may require that the entire program be available for inspection, so that they can know how a data structure is used by all its clients.
- It is an open research question as to whether compilers can make programs in purely functional languages run as quickly as programs in impure languages. At any rate, it is clearly quite difficult.
A sophisticated type system can designate certain data as a "uniquely pointed to" type (with, again, some caveats).
- If only one pointer can point to a data structure, then its parts can be reused when it is "copied". Consider the list reverse function: if the incoming argument is known to be the only pointer to the list that is to be reversed, then we can reuse the cells of the input list for the output list --- because nobody else will ever use the old list again.
- But: these sophisticated type systems are (currently) often hard to learn, and too constraining for many practical programming idioms. (For the curious: ongoing research seeks to make these type systems more practical; look up linear types.)
On the other hand: the use of immutable data means that different parts of a program can share data, saving time and space in certain circumstances.
- For example, two immutable lists may freely share several cons cells (specifically, they may share a common suffix), because no client can come along and change the tail in a way that's suitable for one list but not the other.
- But this benefit can be had in an impure language, simply by using immutable data (e.g., a Java class with only final fields, or a C++ class with only const fields).

Note that the caveats on sophisticated compilers and type systems apply to most high-level language features, such as the uniform reference model (all objects referred to by pointer) and, to a lesser extent, garbage collection. I am a student/researcher in languages and compilers, and I am therefore a big fan of high-level languages, smart compilers, and sophisticated type systems. Ultimately, I believe gains in programmer productivity from using high-level languages outweigh performance costs, as well as the costs of implementing smart compilers and teaching programmers to use clever type systems. However, these caveats and costs must still be kept in mind when evaluating the importance or value of high-level languages.

Expressiveness

Certain data structures, including (but not limited to) cyclic data structures, are inherently hard to express in purely functional languages.

Consider a doubly linked list:
```
datatype 'a DList =
         DEmpty
       | DNode of {elem:'a, prev:'a DList, next:'a DList};
```
It is obvious how to construct an empty linked list, or a linked list with one node:
```
val empty_dlist = DEmpty
val single_dlist = DNode {elem=25, prev=DEmpty, next=DEmpty};
```
But how do we prepend a node onto this list? OK, the empty case is easy, but what about the node case?
```
fun prepend x Empty =
      DNode {elem=25, prev=DEmpty, next=DEmpty}
  | prepend x (DNode {elem, prev, next}) =
      DNode {elem=x, prev=DEmpty,
             next=(DNode {elem, prev=(XXX?), next=(YYY?)})};
```
What will we fill in for the (XXX?)? The prev pointer of the second node must point to the first node, which we are currently constructing (i.e., the node whose elem is x). But we have no way of referring to this node until it's constructed. And what will we fill in for (YYY?)? We must recursively reconstruct the next node --- after all, we must update its prev pointer to point to the second node --- but what will we pass to it to use as the prev value?

Pure functional languages have solutions to this problem, but they're complicated and arguably less natural than simply allowing the pointer to be updated after the node is constructed.
Another example: consider an array whose values are computed incrementally, where each array position a_i is computed from the values of previous array elements a₀ ... a_i-1. How do you write the array constructor expression? One way is to construct a list, then send the list to a list-to-array conversion function; but how do you implement the list-to-array function?

Again, it is arguably more natural to allow the programmer to allocate the array, then afterwards update the values by running the computation.
Again, sophisticated type systems or language mechanisms can help with this expressiveness problem, but they can be overly complex or restrictive.

Permissiveness

Permissiveness is arguably just a different kind of expressiveness --- if a language is too permissive, then it is because it does not allow the programmer to express some restriction that (s)he wishes to express.

In this case, we would like to express the following simple constraint: Don't copy the world. If we're representing the world as a data structure, then arguably we should not permit more than one example of this world to be "alive" in the program at once. But in an ordinary functional language, it is easy to copy the world:

fun copy (w:World) = (w, w);

This makes a pair of worlds. In a language where the world is represented as mutable state (i.e., the current contents of memory, which can be updated), the "world" is the implicitly unique thing that can't be copied.

When the world can be duplicated, then it can sometimes be complex to make sure this never happens; or that, if it happens, you're always using the same world.

There are fancy type systems --- e.g., the linear type systems we alluded to above, in the section on efficiency --- that prevent the user from keeping more than one pointer to a given value. These systems can prevent "copying the world", but the usual caveats apply.

Interaction with the outside world

The notion of state update as a function that constructs a new world is all fine and well, but the fact is that most of the rest of the world is not defined this way. To interact with the outside world, programs need to be aware of side effects, somehow.

Input and output (I/O) inherently depend on a changing world. For example, consider a network card buffer: this is a specific chunk of memory in the computer where your hardware sticks incoming or outgoing network data. Imagine processing network buffer data:

When a packet arrives in the "incoming bytes" buffer, that buffer has been changed. If you have a reference to this buffer, your program (and programming language implementation) had better not pretend to you that this buffer has the same immutable value. So, for example, your language had better not cache the previous value of the network buffer and return that as the result of getBuffer(). (Notice that this is a perfectly fine compiler optimization for a truly pure function.)
Conversely, if you want to shove new bytes into the "outgoing bytes" buffer, your programming language had better let you do it --- simply constructing a fresh buffer somewhere else in memory won't do.

One solution to this problem is to program those lower layers in C, or some other impure language. The runtime system of your programming language will might present you with a purely functional interface to this buffer. For example, perhaps the only way to process network buffers is to implement a function of type Buffer -> Buffer and register it to the system.

But this is unsatisfactory --- you've simply admitted that your language is not expressive enough to capture this kind of I/O operation.

Input and output has always been a rather vexing problem for functional languages. Languages like ML simply punt and accept impurity (side effects). Other languages have dared to be pure, but until Haskell none of them had a really satisfactory solution to I/O. Haskell, a pure functional language, copes with I/O using a special sequencing construct called a monad.

We won't cover monads in this class, but Haskell fans claim that monads have many nice properties, and indeed large Haskell programs using I/O have been written. However, performing I/O (and simulating other side effects with monads) nevertheless suffers from a problem similar to the "threading the world" problem described below.

Abstraction and ease of evolution

When you have to model side effects using an explicit world argument and return value, then every function that may have side effects must take the world as an argument, and return the new world as a result.

This doesn't sound so bad, until you realize that this also means that any time a function has a side effect, every function that calls it must take and return the world. And every function that calls those functions, and so on, up the chain.

This results in a vexing software engineering problem:

Suppose you have a function f, which is a pure function.
Then, later, you find that you want to add a side effect to f. (This is more common than you think: for example, while debugging you might want to insert a statement that changes "the world" to display a debugging message on the standard output.)
Now, you must "thread the world" down from the nearest side-effecting function through the entire chain of calls that approaches f, and back up. This could involve modifying dozens or hundreds of functions.

This is the software evolution argument for side effects. Another argument is the abstraction argument: when side effects require "threading the world" in this manner, then all clients have to know about a function's side effects in order to use it. This prevents the function from hiding that aspect of its implementation from clients --- the "side-effect-ness" of a function cannot be hidden.

Some people claim that side effects should not be hidable, but I disagree. A function may present an abstraction that is purely functional, but its implementation may use side effects in some "harmless" and correct way

For example, a factorial function may internally use an updatable data structure to store previous answers for later use, i.e. a cache (this optimization is called memoizing). This abstraction is not "clean" if the caller must pass the function a representation of its "world" (the cache of previously computed factorials), and remember to pass the resulting world again next time --- which is the only way that the programmer can make memoization happen in a purely functional language.

(Purists would reply that compilers can perform memoization automatically in a functional language, and indeed can often do so more correctly and consistently than human programmers can. But there are clearly cases when a programmer's hand-coded memoization would be superior, because programmers can know more about the function in question than the compiler can.)

Suggested exercises

Define a doubly-linked list using a data type containing refs. Write an updating prepend function for such a data type. Write a copy function.
Write a function that produces an array such that every element is the sum of all the previous elements. Then, write a functional version that generates a list and uses the Array.fromList function.