Purely Functional Implementation of State
1 Essentials of Store-Passing
Here we explore the implementation of state (in the form of boxes) via purely functional mechanisms. The key idea is to follow a protocol of store-passing, where an explicit representation of the "mutable storage" (from now on called the store) is passed in to each function, and returned as part of its result.
Like the environment, the store is a map that lets us look up values of things. However, whereas the environment tracks the lexical (static) structure of program expressions, changes to the store are observable by anything that happens afterward in the (dynamic) execution of the program. Neither one is suited to perform the other’s job, and we need both.
We define several new data types to help model the store and our usage of it:
; We use numbers to represent storage locations, but we define ; a type alias to avoid confusing locations with other numbers. (define-type-alias Location number) ; A storage cell has a location and a value. (define-type Cell [cell (loc : Location) (value : Value)]) ; A store contains a collection of cells and tracks the next ; unused location. (define-type Store [store (next-loc : Location) (cells : (listof Cell))]) ; An operation that changes the state of the store does so by ; returning a new store as part of its result. (define-type (Result 'a) ; The contents of the result are a value and a store. [v*s (value : 'a) (store : Store)])
We also extend the Expr and Value data types to model the new concepts in our language.
(define-type Expr ... ; Other stuff. [boxE (content : Expr)] ; Creates a new box holding content. [unboxE (box : Expr)] ; Evaluates box and retrieves the content. [setboxE (box : Expr) ; Evaluates box and updates it ... (content : Expr)] ; ... to contain the value of content. [seqE (expr1 : Expr) ; First evaluates expr1; (expr2 : Expr)]) ; then evaluates expr2. (define-type Value [numV (value : number)] ; A numeric value. [funV (var : symbol) (body : Expr) (env : Env)] ; A closure. [boxV (loc : Location)]) ; A box value; note that it only has ; meaning in the context of a store.
And we define some helper functions for manipulating the store:
; Allocates a new cell in the store to hold value; returns the new ; location. (define (alloc [value : Value] [sto : Store]) : (Result Location) (type-case Store sto [store (next-loc cells) ; Return a result with the location and an updated store. (v*s next-loc (store (add1 next-loc) (cons (cell next-loc value) cells)))])) ; Fetches the value of loc in the store. (define (fetch [loc : Location] [sto : Store]) : Value ...) ; Left as exercise. ; Updates store so that loc contains the new value. (define (update [loc : Location] [value : Value] [sto : Store]) : Store ...) ; Left as exercise.
Finally, we update our interpreter to consume and produce a store in each call.
(define (interp [expr : Expr] [env : Env] [sto : Store]) : (Result Value) (type-case Expr expr ... ; Other cases omitted. [seqE (expr1 expr2) ; Note: the value of expr1 is intentionally ignored. (let ([sto1 (v*s-store (interp expr1 env sto))]) (interp expr2 env sto1))] [boxE (content) (type-case (Result Value) (interp content env sto) [v*s (val sto1) (type-case (Result Location) (alloc val sto1) [v*s (loc sto2) (v*s (boxV loc) sto2)])])] [unboxE (box-expr) (type-case (Result Value) (interp box-expr env sto) [v*s (box-val sto1) (v*s (fetch (boxV-loc box-val) sto1) sto1)])] [setboxE (box-expr content-expr) (type-case (Result Value) (interp box-expr env sto) [v*s (box-val sto1) (type-case (Result Value) (interp content-expr env sto1) [v*s (content-val sto2) (v*s box-val ; Return the box (so we don't need a voidV). (update (boxV-loc box-val) ; box-val better be a boxV! content-val sto2))])])]))
In order to interpret a program, we call interp with an inital environment and store.
(define empty-store (store 0 empty)) (define expr1 '(with (b (box 3)) (with (f (fun (x) (seq (setbox b (+ x (unbox b))) (unbox b)))) (+ (f 10) (f 16))))) (define expr2 '(with (b (box 3)) (seq (setbox b (+ (unbox b) 5)) b))) > (interp (parse expr1) base-env empty-store) - (Result Value) (v*s (numV 42) (store 1 (cell 0 (numV 29)))) > (interp (parse expr2) base-env empty-store) - (Result Value) (v*s (boxV 0) (store 1 (cell 0 (numV 8))))
Note how, when evaluating the expr2, we need to look up the location of the box (0) in the store to see its contents.
This store-passing implementation strategy works because each store instance is passed to only one store-modifying function. The function returns an updated store, and we pass that to the next step that needs a store. Hence, we talk about "the store" as if it were a single persistent entity, even though there are actually a number of "snapshots" of it from various points in the execution.
This linear "threading" of the store through the program forces a sequential execution order on the program: i.e., the different versions of the store determine which expressions can see which other expressions’ side effects. Unlikely previous interpreters that we wrote (which were for purely functional languages), this one cannot be easily parallelized without fundamentally changing the semantics.
2 Abstracting the Pattern
Doing this store-passing by hand can be tedious, and it requires care on the part of the programmer to avoid accidentally reusing a store across two control flow paths. This is unfortunate, so let’s try to look for patterns that we can factor out into abstractions that will prevent us from repeating ourselves. This will also have the benefit of presenting fewer opportunities for mistakes.
The first thing to note is that every function that operates on the store consumes a store as an argument. We’ve consistently made these the final argument, so we can apply a currying transformation to separate them from the rest of the definitions.
(define (alloc [value : Value]) : (Store -> (Result Location)) (λ (sto) ...)) (define (fetch [loc : Location]) : (Store -> (Result Value)) (λ (sto) ...)) (define (update [loc : Location] [value : Value]) : (Store -> (Result void)) (λ (sto) ...)) (define (interp [expr : Expr] [env : Env]) : (Store -> (Result Value)) (λ (sto) ...))
Note how the signatures of these functions (aside from the return type) look like they were (or would have been) in a world where we had state available implicitly (for example, if we were allowing ourselves to use Racket’s boxes to implement state in the interpreter).
The type (Store -> (Result ’a)) is so common that we’ll define an alias for it. We’ll call it ST, an abbreviation for "store transformer".
(define-type-alias (ST 'a) (Store -> (Result 'a)))
Also noteworthy is the fact that store-manipulating operations are always "stitched together" by following the same pattern, which looks essentially like this:
(type-case (Result 'a) (do-first-thing sto) [v*s (value new-sto) (do-second-thing value new-sto)])
We can factor this out be abstracting over the two do-something functions:
; Compose two store-transforming operations into a bigger one. ; The new operation runs the first sub-operation, then runs the ; second one with the first one's result as its argument. (define (st-seq [step1 : (ST 'a)] [step2 : ('a -> (ST 'b))]) : (ST 'b) (λ (store) (type-case (Result 'a) (step1 sto) [v*s (value sto1) ((step2 value) sto1)])))
Let’s try re-writing the interpreter using the refactored store helpers and st-seq. The seqE and unboxE cases work without a hitch.
(define (interp [expr : Expr] [env : Env]) : (ST Value) (type-case Expr expr ... [seqE (expr1 expr2) (st-seq (interp expr1 env) (λ (_) (interp expr2 env)))] [unboxE (box-expr) (st-seq (interp box-expr env) (compose (fetch boxV-loc)))] ...))
However, when we get to the the boxE case, there’s a problem:
(define (interp [expr : Expr] [env : Env]) : (ST Value) (type-case Expr expr ... [boxE (content) (st-seq (interp content env) (λ (val) (st-seq (alloc val) (λ (loc) (boxV loc)))))] ...))
When we get to the code in red, we have a Value (specifically, a boxV) that we’d like to return, but what we actually need is a (Result Value). Unfortunately, in rewriting everything in terms of store-passing combinators, we made the store implicit, so we no longer have one available to construct a Result from. In order to complete this case, we need one more thing: a way to take a plain value like this and make it into a store-transformer. The store-transformer doesn’t actually do anything to the store: it just bundles it with the value we gave it to create a Result. We’ll call this new function st-return, since it essentially just lets us return a value when we’re operating in the store-passing framework.
(define (st-return [value : 'a]) : (ST 'a) (λ (sto) (v*s value sto)))
Now we can finish the implementation of the other two cases:
(define (interp [expr : Expr] [env : Env]) : (ST Value) (type-case Expr expr ... [boxE (content) (st-seq (interp content env) (λ (val) (st-seq (alloc val) (compose st-return boxV))))] [setboxE (box-expr content-expr) (st-seq (interp box-expr env) (λ (box-val) (st-seq (interp content-expr env) (λ (content-val) (st-seq (update (boxV-loc box-val) content-val) (λ (_) (st-return box-val)))))))]))
See how the store has gone away completely in this version of the interpreter. This is a lot cleaner than our initial implementation of store-passing, and it will be much less prone to bugs.
W still end up making many calls to st-seq in each case. It’s tempting to want to put all of our store-transforming operations into a list, and then have something like an st-begin that would apply a fold to them. In an untyped variant of Racket, we could actually write such a function, which would allow us to write:
(define (interp [expr : Expr] [env : Env]) : (ST Value) (type-case Expr expr ... [boxE (content) (st-begin (interp content env) alloc (compose st-return boxV))] [setboxE (box-expr content-expr) ; NB. We can't define st-begin as a function in plai-typed! (st-begin (interp box-expr env) (λ (box-val) (interp content-expr env)) (λ (content-val) (update (boxV-loc box-cal) content-val)) (λ (_) (st-return box-val)))]))
In plai-typed, we can’t write this because (1) we can’t write functions that take an arbitrary number of arguments and (2) the steps are represented by functions with different types, so we can’t put them all together in a list. However, we’ll see later how we can use Racket’s macro system to define a syntactic extension that will let us write code that looks like the following (and can still be statically type-checked):
(define (interp [expr : Expr] [env : Env]) : (ST Value) (type-case Expr expr ... [setboxE (box-expr content-expr) (st-begin box-val <- (interp box-expr env) content-val <- (interp content-expr env) (update (boxV-loc box-cal) content-val) (st-return box-val))]))
3 Applications and Observations
After defining abstractions that help combine state transformers, we’re able to write code in store-passing style that looks almost like "ordinary" (non-store-passing) code. However, it still seems like an awkward exercise to go through, and the resulting code, which creates a new store structure for every allocation and mutation, seems like it must be wildly inefficient. You might be wondering whether there’s any reason that anyone would use store-passing style (aside from trying to understand how it works). In fact, there are some interesting applications (and observations) about the store-passing model:
The type constructor ST and combining operators that we defined (st-seq and st-return) comprise a monad. Monads are a general framework for linear composition of computations in the presence of interesting features. State is one example, but we’ll soon see another one (control), and there are others that we may or may not have time to talk about (e.g., non-determinism, errors).
In a lazy language (such as Haskell), store-passing style is the way to enforce a linear ordering between side-effects. Because this technique is so important, Haskell provides special syntax for programming with monads (similar to the st-begin macro shown above). In Haskell, monads are often built from abstract types (so, for example, the client program has no way to get at the underlying store), and the linear composition of store transformers is enforced by the type system. This means that it is actually safe for the underlying libraries to implement state with in-place mutations (in contrast with the functional update strategy we followed). This means that the implementation of state can be efficient, even in a language that’s lazy and "pure".
The pattern of creating structures to model side-effects has applications even outside of purely functional languages. For example, when implementing transactions (either in-memory or with an external store), passing around a store (or, more likely, an update log), rather than actually mutating shared structures, prevents concurrent transactions from interfering with each other until one of them actually commits. It also exposes the mutations in a way that other programs can use, for example, to resolve conflicts in some cases without aborting and re-running overlapping transactions from scratch.