[ ^ CSE 341, Winter 2004 home page | Lectures index ]

CSE 341: ADTs, OO design, and collections

Data abstraction

Abstract data types

An abstract data type is a programmer-defined type whose internal representation is hidden, and which can only be accessed via the operations that the programmer provides. For example, consider the following Stack structure in ML:

structure Stack :> sig
    type 'a stack
    exception EmptyStack
    val create : 'a stack
    val isEmpty : 'a stack -> bool
    val push : 'a -> 'a stack -> 'a stack
    val pop : 'a stack -> 'a stack
    val peek : 'a stack -> 'a
end = struct
    type 'a stack = 'a list
    exception EmptyStack
    val create = nil
    fun isEmpty nil = true
      | isEmpty (x::xs) = false
    fun push v (s:'a stack) = v::s
    fun pop nil     = raise EmptyStack
      | pop (x::xs) = xs
    fun peek nil     = raise EmptyStack
      | peek (x::xs) = x
end

The use of an opaquely ascribed signature that omits the representation of 'a stack makes this type abstract. The programmer can only access the stack via the operations that are defined on it.

Pure abstract data types first appeared at the language level in a language called CLU, by Barbara Liskov et al., in the mid-1970's.

Simulating ADTs in languages without representation hiding

In languages that do not provide any mechanisms for hiding data representation, we can still program in an ADT-like style, relying on programmer discipline to "hide" representation. For example, consider the following stack abstraction in Scheme (we assume the presence of an exception library like the one we defined in our notes on call/cc):

(define empty-stack '())

(define (push v a-stack) (cons v a-stack))

(define (empty? a-stack) (null? a-stack))

(define (pop a-stack)
  (if (null? a-stack)
    (raise '(empty-stack "Trying to pop empty stack"))
    (cdr a-stack)))

(define (peek a-stack)
  (if (null? a-stack)
    (raise '(empty-stack "Trying to peek top of empty stack"))
    (cdr a-stack)))

Using this implementation strategy, we cannot prevent the programmer from accessing the private data representation of a stack. It is only a "friendly agreement" between the implementor and client.

This "exposed representation" style of implementation of ADTs is often used in purely procedural languages like C and Pascal --- these languages have neither ADTs, nor objects, nor first-class lexically scoped functions (the last of these, as we shall see below, can be used to hide representation).

Exposed representations are also often used in Lisp, because Lisp programmers often want to expose the representation --- which is often a list, and which can therefore be manipulated using the full suite of list functions (filter, foldr, etc.). The power of "lists as the universal interface" is compelling enough that Lisp programmers are often willing to live with the risk that some client will accidentally access and break the representation.

Procedural data abstraction

Procedural data abstraction (a term due to William Cook) is a different way of simulating some of the features of ADTs in a language without ADT features. Actually, we've already seen an example of PDA --- the point "object" from previous notes. Here's a Scheme implementation of procedural data abstraction of stacks:

(define (empty-stack)
   (lambda (method . args)
      (case method
         ((empty?) #t)
         ((push)   (non-empty-stack (car args) (empty-stack))
         ((pop)    (raise '(empty "Tried to pop empty stack")))
         ((peek)   (raise '(empty "Tried to peek top of empty stack")))))))

(define (non-empty-stack top rest)
   (lambda (method . args)
      (case method
         ((empty?) #f)
         ((push)   (non-empty-stack (car args) (non-empty-stack top rest)))
         ((pop)    rest)
         ((peek)   top))))

In this formulation, we use functions (procedures) to perform the abstraction. We are relying on the fact that the only legal operation on a function is to call it with some arguments. Since the function controls what arguments it may accept, it can therefore control access to its representation.

Object-oriented programming

OOP can be viewed as a relative of PDA. The key feature that distinguishes ADT-style or PDA-style programming from OOP is inheritance. This adds greater opportunities for code reuse, and adds an important step to the design process.

When programming in abstract data types (or PDAs), the design process follows roughly the following steps:

Identify the abstractions in the program.
Identify the operations on those abstractions.

The OOP design process is as follows:

Identify the abstractions in the program.
Identify the operations on those abstractions.
Factor common functionality into superclasses

Superclass factoring can occur at many points in the process. Usually, one recognizes common functionality after doing some design, and one recognizes even more after doing some implementation.

Two common forms of superclass factoring are refactoring, and framework definition:

Refactoring: A programmer is implementing series of classes, recognizes duplicate code, and factors that code into a superclass.
Frameworks: A library implementor thinks hard about what many clients will need in order to build a certain sort of system, and then provides many "base classes" intended for extension.
- Examples: graphical user interface frameworks, web application server frameworks, etc.
- Effective frameworks are only built by people with experience in building the kind of system for which the framework is intended.

When to use inheritance

There are two basic flavors of inheritance:

Inheritance to express that one class is a special kind of another class. (Inheritance for common interfaces.) For example, you might define a IconButton as a subclass of Button because an icon button is a special kind of button.
Inheritance to reuse code. (Inheritance for "implementation".) For example, you might choose to have Stack inherit from Array because you can reuse the implementation of the array to hold the elements.

Generally, organizing for common interfaces is a better idea in the long run than inheritance purely for code reuse. (Although the Smalltalk libraries do have plenty of examples of the latter; even Smalltalk programmers aren't perfect.)

Concrete vs. abstract classes; leaves vs. internal classes

A concrete class is one that is intended to be instantiated dirctly; an abstract class is one that is intended to be used purely for inheritance.

Generally, one should make only "leaf" classes concrete. In other words, one should not inherit from concrete classes if it is possible to avoid it. Instead, create an abstract class that is the parent of the concrete class, and inherit from that. (This is sometimes not possible --- for example, most languages that do not permit you to change the superclass of a class without altering its source code, so if you don't have access to the source then you can't do this.)

Collections

The Smalltalk collections are heavily factored in the OOP style. Consider the following concrete collections:

Array: holds arbitrary objects, supports indexing by integer.
String: holds characters, and supports indexing by integer.
LinkedList: holds arbitrary objects, and supports (O(n)) indexing by integer, as well as head and tail operations.
Set: holds arbitrary objects, eliminating duplicates.
Bag: holds arbitrary objects, keeping track of the count of duplicates.
Dictionary: holds a mapping from unique keys to values.
Interval: holds a sequence of consecutive integers. Supports indexing by integer.

All collections support certain methods, including iteration via do:, which takes a single-argument block and applies it to each element in the collection.

Smalltalk factors these classes into an inheritance hierarchy. Here are some of the abstract collections Smalltalk defines:

Collection: base class of all collections.
ArrayedCollection: a collection indexable by integer.
OrderedCollection: a collection with some total order defined over it.
SortedCollection: a collection ordered by some property of elements.

Smalltalk organizes these in a particular way that trades off some "purity" of interface for code reuse. If one were factoring these collections ourselves, in the "cleanest" way possible, how would one do it?