[ ^ CSE 341, Winter 2004 home page | Lectures index ]

CSE 341: Smalltalk intro

History of Smalltalk and OOP

Fig. 1 shows an abbreviated view of the history of object-oriented programming languages. OOP languages are shown inside the rounded box.

[Family DAG of Smalltalk, and related languages, with timeline]

Fig. 1: Family DAG of Smalltalk, and related languages (Object-oriented languages in box.)

A few highlights of OOP history:

1964: While working at Univac, Kristen Nygaard and Ole-Johan Dahl develop an extension of Algol-60 based on "activities" and "processes", which prefigure "classes" and "objects".
Sept. 1966: Nygaard and Dahl publish classic paper "SIMULA -- an ALGOL-Based Simulation Language" in CACM 9(9) (ed. D. E. Knuth).
1967: Nygaard and Dahl develop Simula-67, the first full-fledged object-oriented programming language, and possibly the first language with multithreaded concurrency in the modern sense.
1966-1967: Alan Kay learns OOP by reading 80 feet of printed Simula-67 code with no prior acquaintance with the language.
1972: Alan Kay, Dan Ingalls, et al. invent Smalltalk-72 at Xerox PARC.
1980: Release of Smalltalk-80, which (among other things) has a full metaclass hierarchy. Essentially defines "modern Smalltalk", as it is used today.
1987: Dave Ungar and Randall B. Smith develop Self, the first purely prototype-based object-oriented programming language.
1994: James Gosling et al. develop Java at Sun Microsystems.
1995: Kay and colleagues at Disney develop Squeak, an open-source dialect of Smalltalk implemented in itself. (Squeak is the dialect we'll be using in this class.)
1997: Martin Odersky and Phil Wadler invent the Pizza extension to Java, which incorporates several features from functional languages.
2003 Sun releases beta version of Java 1.5, which incorporates the generics (bounded polymorphism) proposal of Pizza.

An interesting excerpt from The Early History of Smalltalk (SIGPLAN Notices, Kay 1993):

The second bet had even more surprising results. I had expected that the new Smalltalk would be an iconic language and would take at least two years to invent, but fate intervened. One day, in a typical PARC hallway bullsession, Ted Kaehler, Dan Ingalls, and I were standing around talking about programming languages. The subject of power came up and the two of them wondered how large a language one would have to make to get great power. With as much panache as I could muster, I asserted that you could define "the most powerful language in the world" in "a page of code". They said, "Put up or shut up".

Alan Kay won the bet. Shortly thereafter, Dan Ingalls implemented the first Smalltalk interpreter.

Smalltalk expression syntax

The following describes the syntax of "core Smalltalk", which includes all important expression forms but excludes the syntax for defining methods and classes:

expr ::= atom | binding
       | unarySend | infixSend | keywordSend
       | ( expr )

atom ::= ID | literal | block
literal ::= INTEGER | STRING | ...
block ::= [ [:ID* |] stmt* ]

binding     ::= name := expr | name _ expr

unarySend   ::= expr ID
infixSend   ::= expr OPERATOR expr
keywordSend ::= expr [ID: expr]+

Parenthesized expressions simply enforce order of evaluation, which leaves atoms, bindings, and the various flavors of sends as "interesting" operations.

Literal values

Smalltalk syntaxes for literals are as follows:

3         "The number 3"
$3        "The character 3"
'3'       "The string containing one character, 3"
#3        "The symbol 3"
#(1 2 3)  "The array of numbers containing 1, 2, and 3"

Globally defined names

Smalltalk has some "globally defined names" that are not strictly literals, but refer to objects that might be built-in literals in other languages. For example:

nil       "The UndefinedObject."
true      "The True object."
false     "The False object."

Assorted syntax quirks

Strings are written using single quotes, and comments are written using double quotes.
Unary messages have highest precedence; infix messages have next highest precedence; and keyword messages have lowest precedence. Therefore,
```
20 negated + 4 printOn: Transcript base: 16
```
is equivalent to
```
((20 negated) + 4) printOn: Transcript base: 16
```
Unary and infix message sends associate left to right. Therefore, infix arithmetic operations do not follow "normal" precedence rules:
```
3 + 4 * 5     "Parses as (3 + 4) * 5; evaluates to 35."
```
Periods separate statements. Semicolons separate multiple messages sent to the same receiver:
```
x := 'a4'.
x at: 1 put: $3; asNumber.    "Evaluates to 34."
```
The above code creates a String object with the initial contents 'a4' It then sends the at:put: message to the string, which imperatively replaces $a with $3 (hence x refers to a string containing '34'). Then, because of the semicolon, the unary message asNumber is sent to the same object, which returns a fresh number object with the value 34.

Smalltalk expression semantics

Smalltalk semantics is usually stated as follows:

Everything is an object.
Objects communicate by sending and receiving messages.
Objects have their own state.
Every object is an instance of a class.
A class provides the behavior for all its instances.

However, this is a relatively informal description (it does not fit the dynamic semantics formula of we've been using all quarter), so we must be more explicit. In this section, we will focus on the first two of these bullets.

Objects and messages

All Smalltalk values are objects. Expressions evaluate to objects through the application of message sends. Message sends come in three syntactic varieties (unary, infix, and keyword sends), but have the same semantics. Here are some examples of message sends:

5 negated.     "A unary send.   => -5"
5 + 6.         "An infix send.  => 11"
'hi' at: 1.    "A keyword send. => $h"

All operations (except assignment; see below) are message sends. This differs from languages like Java or C++, where some values are not objects (consider int in Java or C++) and some expressions or function calls do not have standard message-send semantics.

To evaluate a message send, one does the following:

Evaluate the receiver expression to an object value, and (for infix or keyword sends) evaluate any other argument expressions to object values.
Find the method of the object's class corresponding to the message.
Evaluate the method's body in the environment with self bound to the receiver value, the receiver's field names bound to their field values, and any other argument values bound to their respective names.
Return the method's value to the sender of the message.

Notice the similarity between function call in Scheme or ML, and Smalltalk message send. The major differences are that

The receiver argument has privileged status.
A different function body is evaluated depending on the receiver's class --- i.e., Smalltalk has dynamic dispatch, where the message is "dispatched" to a different method depending on the runtime value of the receiver.

Terminological aside: Sometimes message send is called virtual function call (as in C++), method call (as the Java community often calls it), or virtual method call (usually used in Java/C++ only to distinguish virtual calls from static or other non-virtual calls).

Instantiating objects: the `new` message

Classes themselves are objects (everything is an object), and the simplest way of instantiating an object of a class is by sending the new message to the class:

Array new.   "Evaluates to #()"

Notice that, unlike some other languages, new is not a "special" operation --- it's an ordinary message send to a class object.

The default new implementation doesn't initialize any of the fields of an object (the Array class happens to overrides the new message to initialize its fields). We'll discuss better ways to instantiate things later.

Assignment/binding

The only "interesting" expression form in Smalltalk that is neither a value nor a message send is the assignment expression. Assignment is denoted with the := or symbols (in Squeak, the left-arrow character is typed with the underscore _ key, and in some fonts will be rendered as an underscore).

In Smalltalk, as in Scheme, all bindings are implicitly mutable (i.e., all bindings are to "refs", and ref dereference is implicit).

The assignment expression as a whole evaluates to the assigned value:

(x := 5) negated.   "=> -5"

A note on declaring names: In the Workspace, assignment creates a fresh binding when the bound name is undefined, and mutates an existing binding if the name exists. In method contexts, variable names must be pre-declared (there are a variety of contexts available in methods; see the section below on defining methods).

Smalltalk blocks are lambdas

Smalltalk has lexically scoped closures (a.k.a. lambdas, a.k.a. anonymous functions), which are called blocks. No-argument blocks are are denoted by the syntax:

[ stmt ]

i.e., a statement enclosed in square brackets. One form of statement is an expression (see the section on method bodies, below). For blocks with arguments, formal parameters may be specified as colon-preceded names before a vertical bar at the start of the closure:

[ :param1 ... :paramN | stmt ]

Everything in Smalltalk is an object, so a block is also an object (belonging, in Squeak, to the BlockContext class). Blocks are evaluated by sending the various value messages:

"Smalltalk"               "Rough ML equivalent"
[ 3 ].                    "fn () => 3;"
[ 3 ] value.              "(fn () => 3)();"
[ :x :y | x + y ].        "fn (x, y) => x + y;"
a := [ :x :y | x + y ].   "val a = fn (x, y) => x + y;"
a value: 1 value: 2.      "a(1, 2)"

Closures with many arguments are evaluated using up to 4 value: keywords:

seal := [ :a :b :c :d | a + b * c + d ].
seal value: 1 value: 2 value: 3 value: 4.

For argument lists longer than that, or if you just don't feel like typing value: too many times, you can use the valueWithArguments: message, which takes an array:

walrus := [ :a :b :c :d :e | a + b * c + d * e ].
walrus valueWithArguments: #( 10 20 30 40 50 ).  "Note #() syntax"

Closures and environments

Closures are lexically scoped, but they may have arbitrary side effects, including the effect of changing bindings in enclosing environments:

"Executing this code..."   "Yields this value for i"
i := 5.                    "5"
[ i := 7 ] value.          "7"
[ :i | i := 9 ] value: 2.  "2, then 9 in local scope; 7 in outer scope"

Control structures using blocks

ML and Scheme use a combination of special expression forms (e.g., if/then/else and cond) and higher-order functions for control structures (consider map). Smalltalk eschews all "special forms" (aside from assignment, which isn't really a control structure). Instead, Smalltalk uses higher-order functions exclusively.

Recall that the key property of Scheme special forms was control over evaluation: it was crucial that, for example, only one branch of the if expression be evaluated. Closures are suitable for implementing control structures precisely because the body of a closure is not necessarily evaluated --- it is evaluated only if it is applied (in Smalltalk, this means sending one of the value messages).

Transcript open.  "Open a Transcript window"

"The timesRepeat: method of the Integer class evaluates
 its argument N times"
5 timesRepeat: [
    Transcript show: 'hi'; cr.
].

"The ifTrue:ifFalse: method of the two Boolean classes evaluate
 one of their arguments, depending on the Boolean value."
x = 0 ifTrue:  [ Transcript show: 'Cannot divide by zero' ]
      ifFalse: [ Transcript show: (1.0 / x) asString. ].

"The whileTrue: method of the Block class repeatedly evaluates the
receiver, then the argument, as long as the receiver evaluates to
true."
i := 0.
[ i < 10 ] whileTrue: [ i := i + 1. ].

Thought question: why is the receiver of whileTrue a block, and not a boolean?

Aside: So if control structures can be implemented purely using higher-order functions, why do ML and Scheme have special forms? Instead of ML's if/then/else, why not define a function

fun ifThenElse true  thenFn _      = thenFn()
  | ifThenElse false _      elseFn = elseFn();

The answer appears to be (mostly) syntactic convenience:

Compared to Smalltalk, ML and Scheme use a relatively verbose syntax for no-argument anonymous functions
Because ML/Scheme arguments are not labeled, as are Smalltalk keyword sends, uses of functions with several anonymous function arguments can be hard to read.

Consider how a client would use our ML ifThenElse function:

val x = ifThenElse y (fn () => "hi") (fn () => "bye")

This rather clumsily performs the equivalent of:

val x = if y then "hi" else "bye"

Programmers have always been prone to engaging in huge debates about syntax. Usually the answers to these debates are highly subjective and inconsequential, compared to arguments about semantics or programming paradigms. However, Smalltalk demonstrates that a compact syntax for anonymous functions is unambiguously and objectively a big win.

Smalltalk classes, methods, and inheritance

So far we have described the expression forms in Smalltalk. We have seen that essentially all evaluation forms are message sends. Classes and methods are how the programmer defines the handling of message sends. (Another way of saying this is that the previous section describes expressions, and the following section describes declarations.)

Defining classes

Every Smalltalk object has a class, which defines the shared behavior of its instances. Classes are constructed by sending a message named

subclass:instanceVariableNames:classVariableNames:poolDictionaries:category:

to a class. (Classes are themselves objects, and they all inherit the ability to define a subclass from the Object class.) For example:

Object subclass: #Point
    instanceVariableNames: 'x y'
    classVariableNames: 'origin'
    poolDictionaries: ''
    category: 'CSE341-Examples'

Point subclass: #ColoredPoint
    instanceVariableNames: 'color'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'CSE341-Examples'

For the purposes of this class, we'll ignore the poolDictionaries: parameter. The other arguments are as follows:

subclass: takes a Symbol object that names the new subclass.
instanceVariableNames: takes a string containing a whitespace-separated list of instance variable names (i.e., field names) for the new class.
classVariableNames: takes a string containing a whitespace-separated list of class variable names (i.e., variables that are shared by all instances of this class, much like static fields in Java). Squeak requires that class variables begin with a capital letter.
category: takes a string naming a "class category". By convention, the category has two parts separated by dashes, although this is not necessary. The class category has no semantic significance --- it is purely a tool that helps organize the presentation of code to the user. In particular, class categories do not define separate namespaces.

Defining methods

Once a class has been defined, you can use any of the various environment browsing tools (e.g., the Class Browser or the Package Browser) to edit the class. In particular, you can add, change, or remove methods. (See the Guzdial text or the various Squeak tutorials for information on how to navigate in these browsers.)

A method body consists of three parts:

Header line
Local variable declarations
Method body

Header line: The header line defines the method name and (optionally) arguments. The syntax differs only slightly for the various kinds of messages. For unary messages, this simply contains the message name. For infix messages, this line contains the operator followed by the argument name:

= aString
    "The definition of the = infix method for String"
    ...

For keyword messages, this line contains the keywords, where each keyword is followed by a colon and argument name:

at: index put: aCharacter
    "The definition of at:put: keyword method for String"
    ...

As the above examples show, by convention the header line is followed by a comment that documents the method. (This convention is, alas, observed only intermittently in the actual Squeak sources.)

Local declarations: The local declarations are a whitespace-separated sequence of names in between vertical bars:

fooMethod: aValue
    "A method that declares some locals, but does nothing."
    | localA localB
      x y z |

If any local is not used, Squeak's browsers will helpfully ask you whether you want to remove the declaration, but this is not required. Unlike in the Workspace, all names must be declared before use (see below).

Method body: The method body contains a statement. Statements have the following form:

stmt ::= evalStmt | returnStmt | stmtSequence

evalStmt     ::= expr
returnStmt   ::= ^ expr
stmtSequence ::= stmt*\.

That is, a statement either evaluates an expression, returns the value of an expression to the sender of the message, or executes a sequence of period-separated statements in order. Evaluation and sequences are straightforward and have "obvious" semantics that should be familiar to you. Explicit return statements, however, are novel --- recall that ML functions implicity return the value of the body expression, and Scheme functions return the last body expression), so these languages have no need for explicit return statements.

Returns actually have a fairly interesting semantics in Smalltalk, and merit further discussion.

Syntax quirk: In Squeak, some fonts render the return symbol ^ as an up-arrow.

Blocks, methods, and lexical returns

Consider the following method in some hypothetical class:

ifZero: aNumber get: aValue
    "Returns aValue if aNumber is zero."
     aNumber = 0 ifTrue: [ ^ aValue ].
     ^ nil

Consider the return statement, ^ aValue. This code is lexically nested inside a closure --- so one might naively suppose that the value returns a value from that closure. But that semantics would merely return a value to the enclosing context --- which is not the semantics we want. Instead, we want to return this value from the ifZero:get: method, not from any block inside it.

Return statements therefore return from the lexically (textually) enclosing method, not the nearest lexically enclosing block. The ^ return is therefore sometimes called a "non-local return".

Thought question: How could a programmer implement a non-local return using Scheme's call/cc?

The method environment

The vaiable names accessible in a method belong to the following categories (names in categories earlier in this list shadow names in categories later in this list):

Declared locals and argument names (including the implicit self argument, which is bound to the receiver)
Instance variables
Class variables
Global names (like true)

Using any name that cannot resolve to any of the above will be flagged as an error at compile time (in Squeak, compilation occurs when you "accept" a method).

Inheritance and dynamic dispatch

So far, we've discussed objects, classes, and messages without even mentioning inheritance. It is commonly held that the essence of OOP is inheritance. Inheritance in Smalltalk is fairly straightforward:

A subclass inherits all its superclass's instance variables and class variables. A class variable is shared by the defining class, all its subclasses, and all instances thereof. Instance variables are constructed fresh for each instance of every class that defines or inherits the variable.
A subclass inherits all its superclass's methods. Message sends begin lookup with the class of the receiver, and proceed up the superclass chain until they either
- Lookup reaches a class that defines a method for that message; or,
- Lookup reaches the top of the class hierarchy (roughly, the superclass of Object) without finding an appropriate method. Then, a MessageNotUnderstood exception is thrown.
This lookup process is called dynamic dispatch.
A subclass may override one of the superclass methods by defining a method to handle the same message. In this case, the subclass method is invoked, rather than the superclass method.

Sends to `self`

As previously noted, every message send has a receiver object, and during evaluation of a method the receiver is bound to self (self is similar to this in Java or C++).

Sends to self are evaluated just like sends to any other expression: by looking up the method in the value's class that handles that message, and evaluating it. Notice that the class of self can, at runtime, be an instance of any of the subclasses of the current method's class. Consider the following method of an abstract Point class:

distanceFromOrigin
    "Computes the distance of this point from the origin."
    ^ ((x * x) + (y * y)) sqrt.

This accesses the x and y fields directly. But suppose we wished to allow subclasses of Point to compute the values of x and y, rather than only looking up field values. In this case, we could replace the field references with message sends:

distanceFromOrigin
    "Computes the distance of this point from the origin."
    ^ ((self x * self x) + (self y * self y)) sqrt.

Now, subclasses of Point can freely redefine the x and y methods. For example, one could define either a PolarPoint subclass or a RectPoint subclass:

Point subclass: #PolarPoint
    instanceVariableNames: 'theta r'
    ...

Point subclass: #RectPoint
    instanceVariableNames: 'x y'
    ...

The x and y methods of PolarPoint could compute the answers to the x and y messages, whereas RectPoint could simply look them up in the fields:

"In Point..."
x
    ^ x

y
    ^ y

"In PolarPoint..."
x
    ^ rho * theta cos

y
    ^ rho * theta sin

Self sends are one of the key features of object-oriented languages.

Sends to `super` (resends)

Another key feature of object-oriented languages is the ability to begin method lookup in a superclass. Consider a subclass of RectPoint that implements 3-dimensional points:

RectPoint subclass: #RectPoint3D
    instanceVariableNames: 'z'
    ...

Suppose RectPoint and RectPoint3D both implement a scaleBy method:

"In RectPoint...."
scaleBy: factor
    x := self x * factor.
    y := self y * factor

"In RectPoint3D..."
z
    ^ z

scaleBy: factor
    x := self x * factor.
    y := self y * factor.
    z := self z * factor

Notice the redundancy in scaleBy. We'd like to reuse the code in RectPoint for its subclasses. But ordinary sends of scaleBy: to a RectPoint3D instance will always invoke the subclass's version, not the superclass's version, so there's no way to access it. The solution is to have a special kind of send that temporarily changes the rules of method lookup --- a resend:

"In RectPoint3D..."
scaleBy: factor
    super scaleBy: factor.
    z := z * factor

The super send is exactly like a send to self, except that lookup begins in the superclass of the class containing the lexically enclosing method.

super is a special expression that can only appear in the receiver position of a message send. The use of super only affects the current send, not any future sends --- so, for example, if the superclass method performs a self send, then lookup proceeds normally, from the receiver's actual class, not the superclass of the lexically enclosing method.

Therefore, consider the following code:

Object subclass: #Foo instanceVariables: '' ...
Foo subclass: #Bar instanceVariables: '' ...

"In Foo..."
baz
    ^ self bif

bif
    ^ 'hi'

"In Bar..."
baz
    ^ super baz

bif
    ^ 'bye'

Evaluating the send (Bar new) baz will return 'bye', not 'hi'. The super send in Bar's baz invokes Foo's baz, but the self bif in Foo's baz will begin lookup normally, which means beginning with the receiver's actual runtime class. Since the receiver is a Bar, this self send will invoke Bar's bif, not Foo's bif.

Access protection, or lack thereof

Smalltalk classes have no access protection mechanisms for methods --- anyone can send any message to any object.

However, only methods of an object have access to the object's instance variables, because these variables are only added to the environment when evaluating that object's methods.

Since classes inherit their superclasses' instance variables, subclass instances may access variables defined in a superclass.

In Java/C++ terms, all instance variables are protected, and all methods (member functions) are public.

Hence, in Squeak, where everything is implemented in Smalltalk, you can freely change everything about the world, up to and including the implementation of message sending, subclassing, and closure evaluation. Doing such things will break literally everything in your environment, of course.