CSE 341: Smalltalk intro

History of Smalltalk and OOP

Fig. 1 shows an abbreviated view of the history of object-oriented programming languages. OOP languages are shown inside the rounded box.

[Family DAG of Smalltalk, and related languages, with timeline]
Fig. 1: Family DAG of Smalltalk, and related languages (Object-oriented languages in box.)

A few highlights of OOP history:

An interesting excerpt from The Early History of Smalltalk (SIGPLAN Notices, Kay 1993):

The second bet had even more surprising results. I had expected that the new Smalltalk would be an iconic language and would take at least two years to invent, but fate intervened. One day, in a typical PARC hallway bullsession, Ted Kaehler, Dan Ingalls, and I were standing around talking about programming languages. The subject of power came up and the two of them wondered how large a language one would have to make to get great power. With as much panache as I could muster, I asserted that you could define "the most powerful language in the world" in "a page of code". They said, "Put up or shut up".

Alan Kay won the bet. Shortly thereafter, Dan Ingalls implemented the first Smalltalk interpreter.

Smalltalk expression syntax

The following describes the syntax of "core Smalltalk", which includes all important expression forms but excludes the syntax for defining methods and classes:

expr ::= atom | binding
       | unarySend | infixSend | keywordSend
       | ( expr )

atom ::= ID | literal | block
literal ::= INTEGER | STRING | ...
block ::= [ [:ID* |] stmt* ]

binding     ::= name := expr | name _ expr

unarySend   ::= expr ID
infixSend   ::= expr OPERATOR expr
keywordSend ::= expr [ID: expr]+

Parenthesized expressions simply enforce order of evaluation, which leaves atoms, bindings, and the various flavors of sends as "interesting" operations.

Literal values

Smalltalk syntaxes for literals are as follows:

3         "The number 3"
$3        "The character 3"
'3'       "The string containing one character, 3"
#3        "The symbol 3"
#(1 2 3)  "The array of numbers containing 1, 2, and 3"

Globally defined names

Smalltalk has some "globally defined names" that are not strictly literals, but refer to objects that might be built-in literals in other languages. For example:

nil       "The UndefinedObject."
true      "The True object."
false     "The False object."

Assorted syntax quirks

Smalltalk expression semantics

Smalltalk semantics is usually stated as follows:

However, this is a relatively informal description (it does not fit the dynamic semantics formula of we've been using all quarter), so we must be more explicit. In this section, we will focus on the first two of these bullets.

Objects and messages

All Smalltalk values are objects. Expressions evaluate to objects through the application of message sends. Message sends come in three syntactic varieties (unary, infix, and keyword sends), but have the same semantics. Here are some examples of message sends:

5 negated.     "A unary send.   => -5"
5 + 6.         "An infix send.  => 11"
'hi' at: 1.    "A keyword send. => $h"

All operations (except assignment; see below) are message sends. This differs from languages like Java or C++, where some values are not objects (consider int in Java or C++) and some expressions or function calls do not have standard message-send semantics.

To evaluate a message send, one does the following:

  1. Evaluate the receiver expression to an object value, and (for infix or keyword sends) evaluate any other argument expressions to object values.
  2. Find the method of the object's class corresponding to the message.
  3. Evaluate the method's body in the environment with self bound to the receiver value, the receiver's field names bound to their field values, and any other argument values bound to their respective names.
  4. Return the method's value to the sender of the message.

Notice the similarity between function call in Scheme or ML, and Smalltalk message send. The major differences are that

Terminological aside: Sometimes message send is called virtual function call (as in C++), method call (as the Java community often calls it), or virtual method call (usually used in Java/C++ only to distinguish virtual calls from static or other non-virtual calls).

Instantiating objects: the new message

Classes themselves are objects (everything is an object), and the simplest way of instantiating an object of a class is by sending the new message to the class:

Array new.   "Evaluates to #()"

Notice that, unlike some other languages, new is not a "special" operation --- it's an ordinary message send to a class object.

The default new implementation doesn't initialize any of the fields of an object (the Array class happens to overrides the new message to initialize its fields). We'll discuss better ways to instantiate things later.

Assignment/binding

The only "interesting" expression form in Smalltalk that is neither a value nor a message send is the assignment expression. Assignment is denoted with the := or <- symbols (in Squeak, the left-arrow character is typed with the underscore _ key, and in some fonts will be rendered as an underscore).

In Smalltalk, as in Scheme, all bindings are implicitly mutable (i.e., all bindings are to "refs", and ref dereference is implicit).

The assignment expression as a whole evaluates to the assigned value:

(x := 5) negated.   "=> -5"

A note on declaring names: In the Workspace, assignment creates a fresh binding when the bound name is undefined, and mutates an existing binding if the name exists. In method contexts, variable names must be pre-declared (there are a variety of contexts available in methods; see the section below on defining methods).

Smalltalk blocks are lambdas

Smalltalk has lexically scoped closures (a.k.a. lambdas, a.k.a. anonymous functions), which are called blocks. No-argument blocks are are denoted by the syntax:

[ stmt ]

i.e., a statement enclosed in square brackets. One form of statement is an expression (see the section on method bodies, below). For blocks with arguments, formal parameters may be specified as colon-preceded names before a vertical bar at the start of the closure:

[ :param1 ... :paramN | stmt ]

Everything in Smalltalk is an object, so a block is also an object (belonging, in Squeak, to the BlockContext class). Blocks are evaluated by sending the various value messages:

"Smalltalk"               "Rough ML equivalent"
[ 3 ].                    "fn () => 3;"
[ 3 ] value.              "(fn () => 3)();"
[ :x :y | x + y ].        "fn (x, y) => x + y;"
a := [ :x :y | x + y ].   "val a = fn (x, y) => x + y;"
a value: 1 value: 2.      "a(1, 2)"

Closures with many arguments are evaluated using up to 4 value: keywords:

seal := [ :a :b :c :d | a + b * c + d ].
seal value: 1 value: 2 value: 3 value: 4.

For argument lists longer than that, or if you just don't feel like typing value: too many times, you can use the valueWithArguments: message, which takes an array:

walrus := [ :a :b :c :d :e | a + b * c + d * e ].
walrus valueWithArguments: #( 10 20 30 40 50 ).  "Note #() syntax"

Closures and environments

Closures are lexically scoped, but they may have arbitrary side effects, including the effect of changing bindings in enclosing environments:

"Executing this code..."   "Yields this value for i"
i := 5.                    "5"
[ i := 7 ] value.          "7"
[ :i | i := 9 ] value: 2.  "2, then 9 in local scope; 7 in outer scope"

Control structures using blocks

ML and Scheme use a combination of special expression forms (e.g., if/then/else and cond) and higher-order functions for control structures (consider map). Smalltalk eschews all "special forms" (aside from assignment, which isn't really a control structure). Instead, Smalltalk uses higher-order functions exclusively.

Recall that the key property of Scheme special forms was control over evaluation: it was crucial that, for example, only one branch of the if expression be evaluated. Closures are suitable for implementing control structures precisely because the body of a closure is not necessarily evaluated --- it is evaluated only if it is applied (in Smalltalk, this means sending one of the value messages).

Transcript open.  "Open a Transcript window"

"The timesRepeat: method of the Integer class evaluates
 its argument N times"
5 timesRepeat: [
    Transcript show: 'hi'; cr.
].

"The ifTrue:ifFalse: method of the two Boolean classes evaluate
 one of their arguments, depending on the Boolean value."
x = 0 ifTrue:  [ Transcript show: 'Cannot divide by zero' ]
      ifFalse: [ Transcript show: (1.0 / x) asString. ].

"The whileTrue: method of the Block class repeatedly evaluates the
receiver, then the argument, as long as the receiver evaluates to
true."
i := 0.
[ i < 10 ] whileTrue: [ i := i + 1. ].

Thought question: why is the receiver of whileTrue a block, and not a boolean?

Aside: So if control structures can be implemented purely using higher-order functions, why do ML and Scheme have special forms? Instead of ML's if/then/else, why not define a function

fun ifThenElse true  thenFn _      = thenFn()
  | ifThenElse false _      elseFn = elseFn();

The answer appears to be (mostly) syntactic convenience:

Consider how a client would use our ML ifThenElse function:

val x = ifThenElse y (fn () => "hi") (fn () => "bye")

This rather clumsily performs the equivalent of:

val x = if y then "hi" else "bye"

Programmers have always been prone to engaging in huge debates about syntax. Usually the answers to these debates are highly subjective and inconsequential, compared to arguments about semantics or programming paradigms. However, Smalltalk demonstrates that a compact syntax for anonymous functions is unambiguously and objectively a big win.

Smalltalk classes, methods, and inheritance

So far we have described the expression forms in Smalltalk. We have seen that essentially all evaluation forms are message sends. Classes and methods are how the programmer defines the handling of message sends. (Another way of saying this is that the previous section describes expressions, and the following section describes declarations.)

Defining classes

Every Smalltalk object has a class, which defines the shared behavior of its instances. Classes are constructed by sending a message named

subclass:instanceVariableNames:classVariableNames:poolDictionaries:category:

to a class. (Classes are themselves objects, and they all inherit the ability to define a subclass from the Object class.) For example:

Object subclass: #Point
    instanceVariableNames: 'x y'
    classVariableNames: 'origin'
    poolDictionaries: ''
    category: 'CSE341-Examples'

Point subclass: #ColoredPoint
    instanceVariableNames: 'color'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'CSE341-Examples'

For the purposes of this class, we'll ignore the poolDictionaries: parameter. The other arguments are as follows:

Defining methods

Once a class has been defined, you can use any of the various environment browsing tools (e.g., the Class Browser or the Package Browser) to edit the class. In particular, you can add, change, or remove methods. (See the Guzdial text or the various Squeak tutorials for information on how to navigate in these browsers.)

A method body consists of three parts:

Header line: The header line defines the method name and (optionally) arguments. The syntax differs only slightly for the various kinds of messages. For unary messages, this simply contains the message name. For infix messages, this line contains the operator followed by the argument name:

= aString
    "The definition of the = infix method for String"
    ...

For keyword messages, this line contains the keywords, where each keyword is followed by a colon and argument name:

at: index put: aCharacter
    "The definition of at:put: keyword method for String"
    ...

As the above examples show, by convention the header line is followed by a comment that documents the method. (This convention is, alas, observed only intermittently in the actual Squeak sources.)

Local declarations: The local declarations are a whitespace-separated sequence of names in between vertical bars:

fooMethod: aValue
    "A method that declares some locals, but does nothing."
    | localA localB
      x y z |

If any local is not used, Squeak's browsers will helpfully ask you whether you want to remove the declaration, but this is not required. Unlike in the Workspace, all names must be declared before use (see below).

Method body: The method body contains a statement. Statements have the following form:

stmt ::= evalStmt | returnStmt | stmtSequence

evalStmt     ::= expr
returnStmt   ::= ^ expr
stmtSequence ::= stmt*\.

That is, a statement either evaluates an expression, returns the value of an expression to the sender of the message, or executes a sequence of period-separated statements in order. Evaluation and sequences are straightforward and have "obvious" semantics that should be familiar to you. Explicit return statements, however, are novel --- recall that ML functions implicity return the value of the body expression, and Scheme functions return the last body expression), so these languages have no need for explicit return statements.

Returns actually have a fairly interesting semantics in Smalltalk, and merit further discussion.

Syntax quirk: In Squeak, some fonts render the return symbol ^ as an up-arrow.

Blocks, methods, and lexical returns

Consider the following method in some hypothetical class:

ifZero: aNumber get: aValue
    "Returns aValue if aNumber is zero."
     aNumber = 0 ifTrue: [ ^ aValue ].
     ^ nil

Consider the return statement, ^ aValue. This code is lexically nested inside a closure --- so one might naively suppose that the value returns a value from that closure. But that semantics would merely return a value to the enclosing context --- which is not the semantics we want. Instead, we want to return this value from the ifZero:get: method, not from any block inside it.

Return statements therefore return from the lexically (textually) enclosing method, not the nearest lexically enclosing block. The ^ return is therefore sometimes called a "non-local return".

Thought question: How could a programmer implement a non-local return using Scheme's call/cc?

The method environment

The vaiable names accessible in a method belong to the following categories (names in categories earlier in this list shadow names in categories later in this list):

Using any name that cannot resolve to any of the above will be flagged as an error at compile time (in Squeak, compilation occurs when you "accept" a method).

Inheritance and dynamic dispatch

So far, we've discussed objects, classes, and messages without even mentioning inheritance. It is commonly held that the essence of OOP is inheritance. Inheritance in Smalltalk is fairly straightforward:

Sends to self

As previously noted, every message send has a receiver object, and during evaluation of a method the receiver is bound to self (self is similar to this in Java or C++).

Sends to self are evaluated just like sends to any other expression: by looking up the method in the value's class that handles that message, and evaluating it. Notice that the class of self can, at runtime, be an instance of any of the subclasses of the current method's class. Consider the following method of an abstract Point class:

distanceFromOrigin
    "Computes the distance of this point from the origin."
    ^ ((x * x) + (y * y)) sqrt.

This accesses the x and y fields directly. But suppose we wished to allow subclasses of Point to compute the values of x and y, rather than only looking up field values. In this case, we could replace the field references with message sends:

distanceFromOrigin
    "Computes the distance of this point from the origin."
    ^ ((self x * self x) + (self y * self y)) sqrt.

Now, subclasses of Point can freely redefine the x and y methods. For example, one could define either a PolarPoint subclass or a RectPoint subclass:

Point subclass: #PolarPoint
    instanceVariableNames: 'theta r'
    ...

Point subclass: #RectPoint
    instanceVariableNames: 'x y'
    ...

The x and y methods of PolarPoint could compute the answers to the x and y messages, whereas RectPoint could simply look them up in the fields:

"In Point..."
x
    ^ x

y
    ^ y

"In PolarPoint..."
x
    ^ rho * theta cos

y
    ^ rho * theta sin

Self sends are one of the key features of object-oriented languages.

Sends to super (resends)

Another key feature of object-oriented languages is the ability to begin method lookup in a superclass. Consider a subclass of RectPoint that implements 3-dimensional points:

RectPoint subclass: #RectPoint3D
    instanceVariableNames: 'z'
    ...

Suppose RectPoint and RectPoint3D both implement a scaleBy method:

"In RectPoint...."
scaleBy: factor
    x := self x * factor.
    y := self y * factor

"In RectPoint3D..."
z
    ^ z

scaleBy: factor
    x := self x * factor.
    y := self y * factor.
    z := self z * factor

Notice the redundancy in scaleBy. We'd like to reuse the code in RectPoint for its subclasses. But ordinary sends of scaleBy: to a RectPoint3D instance will always invoke the subclass's version, not the superclass's version, so there's no way to access it. The solution is to have a special kind of send that temporarily changes the rules of method lookup --- a resend:

"In RectPoint3D..."
scaleBy: factor
    super scaleBy: factor.
    z := z * factor

The super send is exactly like a send to self, except that lookup begins in the superclass of the class containing the lexically enclosing method.

super is a special expression that can only appear in the receiver position of a message send. The use of super only affects the current send, not any future sends --- so, for example, if the superclass method performs a self send, then lookup proceeds normally, from the receiver's actual class, not the superclass of the lexically enclosing method.

Therefore, consider the following code:

Object subclass: #Foo instanceVariables: '' ...
Foo subclass: #Bar instanceVariables: '' ...

"In Foo..."
baz
    ^ self bif

bif
    ^ 'hi'

"In Bar..."
baz
    ^ super baz

bif
    ^ 'bye'

Evaluating the send (Bar new) baz will return 'bye', not 'hi'. The super send in Bar's baz invokes Foo's baz, but the self bif in Foo's baz will begin lookup normally, which means beginning with the receiver's actual runtime class. Since the receiver is a Bar, this self send will invoke Bar's bif, not Foo's bif.

Access protection, or lack thereof

Smalltalk classes have no access protection mechanisms for methods --- anyone can send any message to any object.

However, only methods of an object have access to the object's instance variables, because these variables are only added to the environment when evaluating that object's methods.

Since classes inherit their superclasses' instance variables, subclass instances may access variables defined in a superclass.

In Java/C++ terms, all instance variables are protected, and all methods (member functions) are public.

Hence, in Squeak, where everything is implemented in Smalltalk, you can freely change everything about the world, up to and including the implementation of message sending, subclassing, and closure evaluation. Doing such things will break literally everything in your environment, of course.