CSE 341: ADTs, OO design, and collections

Data abstraction

Abstract data types

An abstract data type is a programmer-defined type whose internal representation is hidden, and which can only be accessed via the operations that the programmer provides. For example, consider the following Stack structure in ML:

structure Stack :> sig
    type 'a stack
    exception EmptyStack
    val create : 'a stack
    val isEmpty : 'a stack -> bool
    val push : 'a -> 'a stack -> 'a stack
    val pop : 'a stack -> 'a stack
    val peek : 'a stack -> 'a
end = struct
    type 'a stack = 'a list
    exception EmptyStack
    val create = nil
    fun isEmpty nil = true
      | isEmpty (x::xs) = false
    fun push v (s:'a stack) = v::s
    fun pop nil     = raise EmptyStack
      | pop (x::xs) = xs
    fun peek nil     = raise EmptyStack
      | peek (x::xs) = x
end

The use of an opaquely ascribed signature that omits the representation of 'a stack makes this type abstract. The programmer can only access the stack via the operations that are defined on it.

Pure abstract data types first appeared at the language level in a language called CLU, by Barbara Liskov et al., in the mid-1970's.

Simulating ADTs in languages without representation hiding

In languages that do not provide any mechanisms for hiding data representation, we can still program in an ADT-like style, relying on programmer discipline to "hide" representation. For example, consider the following stack abstraction in Scheme (we assume the presence of an exception library like the one we defined in our notes on call/cc):

(define empty-stack '())

(define (push v a-stack) (cons v a-stack))

(define (empty? a-stack) (null? a-stack))

(define (pop a-stack)
  (if (null? a-stack)
    (raise '(empty-stack "Trying to pop empty stack"))
    (cdr a-stack)))

(define (peek a-stack)
  (if (null? a-stack)
    (raise '(empty-stack "Trying to peek top of empty stack"))
    (cdr a-stack)))

Using this implementation strategy, we cannot prevent the programmer from accessing the private data representation of a stack. It is only a "friendly agreement" between the implementor and client.

This "exposed representation" style of implementation of ADTs is often used in purely procedural languages like C and Pascal --- these languages have neither ADTs, nor objects, nor first-class lexically scoped functions (the last of these, as we shall see below, can be used to hide representation).

Exposed representations are also often used in Lisp, because Lisp programmers often want to expose the representation --- which is often a list, and which can therefore be manipulated using the full suite of list functions (filter, foldr, etc.). The power of "lists as the universal interface" is compelling enough that Lisp programmers are often willing to live with the risk that some client will accidentally access and break the representation.

Procedural data abstraction

Procedural data abstraction (a term due to William Cook) is a different way of simulating some of the features of ADTs in a language without ADT features. Actually, we've already seen an example of PDA --- the point "object" from previous notes. Here's a Scheme implementation of procedural data abstraction of stacks:

(define (empty-stack)
   (lambda (method . args)
      (case method
         ((empty?) #t)
         ((push)   (non-empty-stack (car args) (empty-stack))
         ((pop)    (raise '(empty "Tried to pop empty stack")))
         ((peek)   (raise '(empty "Tried to peek top of empty stack")))))))

(define (non-empty-stack top rest)
   (lambda (method . args)
      (case method
         ((empty?) #f)
         ((push)   (non-empty-stack (car args) (non-empty-stack top rest)))
         ((pop)    rest)
         ((peek)   top))))

In this formulation, we use functions (procedures) to perform the abstraction. We are relying on the fact that the only legal operation on a function is to call it with some arguments. Since the function controls what arguments it may accept, it can therefore control access to its representation.

Object-oriented programming

OOP can be viewed as a relative of PDA. The key feature that distinguishes ADT-style or PDA-style programming from OOP is inheritance. This adds greater opportunities for code reuse, and adds an important step to the design process.

When programming in abstract data types (or PDAs), the design process follows roughly the following steps:

The OOP design process is as follows:

Superclass factoring can occur at many points in the process. Usually, one recognizes common functionality after doing some design, and one recognizes even more after doing some implementation.

Two common forms of superclass factoring are refactoring, and framework definition:

When to use inheritance

There are two basic flavors of inheritance:

Generally, organizing for common interfaces is a better idea in the long run than inheritance purely for code reuse. (Although the Smalltalk libraries do have plenty of examples of the latter; even Smalltalk programmers aren't perfect.)

Concrete vs. abstract classes; leaves vs. internal classes

A concrete class is one that is intended to be instantiated dirctly; an abstract class is one that is intended to be used purely for inheritance.

Generally, one should make only "leaf" classes concrete. In other words, one should not inherit from concrete classes if it is possible to avoid it. Instead, create an abstract class that is the parent of the concrete class, and inherit from that. (This is sometimes not possible --- for example, most languages that do not permit you to change the superclass of a class without altering its source code, so if you don't have access to the source then you can't do this.)

Collections

The Smalltalk collections are heavily factored in the OOP style. Consider the following concrete collections:

All collections support certain methods, including iteration via do:, which takes a single-argument block and applies it to each element in the collection.

Smalltalk factors these classes into an inheritance hierarchy. Here are some of the abstract collections Smalltalk defines:

Smalltalk organizes these in a particular way that trades off some "purity" of interface for code reuse. If one were factoring these collections ourselves, in the "cleanest" way possible, how would one do it?