Fig. 1 shows an abbreviated view of the history of object-oriented programming languages. OOP languages are shown inside the rounded box.
A few highlights of OOP history:
An interesting excerpt from The Early History of Smalltalk (SIGPLAN Notices, Kay 1993):
The second bet had even more surprising results. I had expected that the new Smalltalk would be an iconic language and would take at least two years to invent, but fate intervened. One day, in a typical PARC hallway bullsession, Ted Kaehler, Dan Ingalls, and I were standing around talking about programming languages. The subject of power came up and the two of them wondered how large a language one would have to make to get great power. With as much panache as I could muster, I asserted that you could define "the most powerful language in the world" in "a page of code". They said, "Put up or shut up".
Alan Kay won the bet. Shortly thereafter, Dan Ingalls implemented the first Smalltalk interpreter.
The following describes the syntax of "core Smalltalk", which includes all important expression forms but excludes the syntax for defining methods and classes:
expr ::= atom | binding | unarySend | infixSend | keywordSend | ( expr ) atom ::= ID | literal | block literal ::= INTEGER | STRING | ... block ::= [ [:ID* |] stmt* ] binding ::= name := expr | name _ expr unarySend ::= expr ID infixSend ::= expr OPERATOR expr keywordSend ::= expr [ID: expr]+
Parenthesized expressions simply enforce order of evaluation, which leaves atoms, bindings, and the various flavors of sends as "interesting" operations.
Smalltalk syntaxes for literals are as follows:
3 "The number 3" $3 "The character 3" '3' "The string containing one character, 3" #3 "The symbol 3" #(1 2 3) "The array of numbers containing 1, 2, and 3"
Smalltalk has some "globally defined names" that are not strictly literals, but refer to objects that might be built-in literals in other languages. For example:
nil "The UndefinedObject." true "The True object." false "The False object."
20 negated + 4 printOn: Transcript base: 16is equivalent to
((20 negated) + 4) printOn: Transcript base: 16
3 + 4 * 5 "Parses as (3 + 4) * 5; evaluates to 35."
x := 'a4'. x at: 1 put: $3; asNumber. "Evaluates to 34."
The above code creates a String
object with the
initial contents 'a4'
It then sends the
at:put:
message to the string, which
imperatively replaces $a
with
$3
(hence x refers to a string containing
'34'
). Then, because of the semicolon, the unary
message asNumber
is sent to the same
object, which returns a fresh number object with the value
34
.
Smalltalk semantics is usually stated as follows:
However, this is a relatively informal description (it does not fit the dynamic semantics formula of we've been using all quarter), so we must be more explicit. In this section, we will focus on the first two of these bullets.
All Smalltalk values are objects. Expressions evaluate to objects through the application of message sends. Message sends come in three syntactic varieties (unary, infix, and keyword sends), but have the same semantics. Here are some examples of message sends:
5 negated. "A unary send. => -5" 5 + 6. "An infix send. => 11" 'hi' at: 1. "A keyword send. => $h"
All operations (except assignment; see below) are
message sends. This differs from languages like Java or C++,
where some values are not objects (consider int
in
Java or C++) and some expressions or function calls do not have
standard message-send semantics.
To evaluate a message send, one does the following:
self
bound to the receiver value, the receiver's
field names bound to their field values, and any other argument
values bound to their respective names.Notice the similarity between function call in Scheme or ML, and Smalltalk message send. The major differences are that
Terminological aside: Sometimes message send is called virtual function call (as in C++), method call (as the Java community often calls it), or virtual method call (usually used in Java/C++ only to distinguish virtual calls from static or other non-virtual calls).
new
messageClasses themselves are objects (everything is an
object), and the simplest way of instantiating an object of a
class is by sending the new
message to the class:
Array new. "Evaluates to #()"
Notice that, unlike some other languages, new
is
not a "special" operation --- it's an ordinary message send to a
class object.
The default new
implementation doesn't initialize
any of the fields of an object (the Array
class
happens to overrides the new
message to initialize
its fields). We'll discuss better ways to instantiate things
later.
The only "interesting" expression form in Smalltalk that is
neither a value nor a message send is the assignment expression.
Assignment is denoted with the :=
or symbols (in Squeak, the left-arrow character is
typed with the underscore _
key, and in some fonts
will be rendered as an underscore).
In Smalltalk, as in Scheme, all bindings are implicitly mutable (i.e., all bindings are to "refs", and ref dereference is implicit).
The assignment expression as a whole evaluates to the assigned value:
(x := 5) negated. "=> -5"
A note on declaring names: In the Workspace, assignment creates a fresh binding when the bound name is undefined, and mutates an existing binding if the name exists. In method contexts, variable names must be pre-declared (there are a variety of contexts available in methods; see the section below on defining methods).
Smalltalk has lexically scoped closures (a.k.a. lambdas, a.k.a. anonymous functions), which are called blocks. No-argument blocks are are denoted by the syntax:
[ stmt ]
i.e., a statement enclosed in square brackets. One form of statement is an expression (see the section on method bodies, below). For blocks with arguments, formal parameters may be specified as colon-preceded names before a vertical bar at the start of the closure:
[ :param1 ... :paramN | stmt ]
Everything in Smalltalk is an object, so a block is also an
object (belonging, in Squeak, to the BlockContext
class). Blocks are evaluated by sending the various
value
messages:
"Smalltalk" "Rough ML equivalent" [ 3 ]. "fn () => 3;" [ 3 ] value. "(fn () => 3)();" [ :x :y | x + y ]. "fn (x, y) => x + y;" a := [ :x :y | x + y ]. "val a = fn (x, y) => x + y;" a value: 1 value: 2. "a(1, 2)"
Closures with many arguments are evaluated using up to 4 value: keywords:
seal := [ :a :b :c :d | a + b * c + d ]. seal value: 1 value: 2 value: 3 value: 4.
For argument lists longer than that, or if you just don't feel like typing value: too many times, you can use the valueWithArguments: message, which takes an array:
walrus := [ :a :b :c :d :e | a + b * c + d * e ]. walrus valueWithArguments: #( 10 20 30 40 50 ). "Note #() syntax"
Closures are lexically scoped, but they may have arbitrary side effects, including the effect of changing bindings in enclosing environments:
"Executing this code..." "Yields this value for i" i := 5. "5" [ i := 7 ] value. "7" [ :i | i := 9 ] value: 2. "2, then 9 in local scope; 7 in outer scope"
ML and Scheme use a combination of special expression forms
(e.g., if/then/else
and cond
) and
higher-order functions for control structures (consider
map
). Smalltalk eschews all "special forms" (aside
from assignment, which isn't really a control structure).
Instead, Smalltalk uses higher-order functions exclusively.
Recall that the key property of Scheme special forms was
control over evaluation: it was crucial that, for
example, only one branch of the if
expression be
evaluated. Closures are suitable for implementing control
structures precisely because the body of a closure is not
necessarily evaluated --- it is evaluated only if it is applied
(in Smalltalk, this means sending one of the value
messages).
Transcript open. "Open a Transcript window" "The timesRepeat: method of the Integer class evaluates its argument N times" 5 timesRepeat: [ Transcript show: 'hi'; cr. ]. "The ifTrue:ifFalse: method of the two Boolean classes evaluate one of their arguments, depending on the Boolean value." x = 0 ifTrue: [ Transcript show: 'Cannot divide by zero' ] ifFalse: [ Transcript show: (1.0 / x) asString. ]. "The whileTrue: method of the Block class repeatedly evaluates the receiver, then the argument, as long as the receiver evaluates to true." i := 0. [ i < 10 ] whileTrue: [ i := i + 1. ].
Thought question: why is the receiver of whileTrue
a block, and not a boolean?
Aside: So if control structures can be
implemented purely using higher-order functions, why do ML and
Scheme have special forms? Instead of ML's
if/then/else
, why not define a function
fun ifThenElse true thenFn _ = thenFn() | ifThenElse false _ elseFn = elseFn();
The answer appears to be (mostly) syntactic convenience:
Consider how a client would use our ML ifThenElse
function:
val x = ifThenElse y (fn () => "hi") (fn () => "bye")
This rather clumsily performs the equivalent of:
val x = if y then "hi" else "bye"
Programmers have always been prone to engaging in huge debates about syntax. Usually the answers to these debates are highly subjective and inconsequential, compared to arguments about semantics or programming paradigms. However, Smalltalk demonstrates that a compact syntax for anonymous functions is unambiguously and objectively a big win.
So far we have described the expression forms in Smalltalk. We have seen that essentially all evaluation forms are message sends. Classes and methods are how the programmer defines the handling of message sends. (Another way of saying this is that the previous section describes expressions, and the following section describes declarations.)
Every Smalltalk object has a class, which defines the shared behavior of its instances. Classes are constructed by sending a message named
subclass:instanceVariableNames:classVariableNames:poolDictionaries:category:
to a class. (Classes are themselves objects, and they all
inherit the ability to define a subclass from the
Object
class.) For example:
Object subclass: #Point instanceVariableNames: 'x y' classVariableNames: 'origin' poolDictionaries: '' category: 'CSE341-Examples' Point subclass: #ColoredPoint instanceVariableNames: 'color' classVariableNames: '' poolDictionaries: '' category: 'CSE341-Examples'
For the purposes of this class, we'll ignore the
poolDictionaries:
parameter. The other arguments are
as follows:
subclass:
takes a Symbol object that names the
new subclass.instanceVariableNames:
takes a string
containing a whitespace-separated list of instance
variable names (i.e., field names) for the new
class.classVariableNames:
takes a string containing a
whitespace-separated list of class variable
names (i.e., variables that are shared by all instances
of this class, much like static
fields in Java).
Squeak requires that class variables begin with a capital
letter.category:
takes a string naming a "class
category". By convention, the category has two parts separated
by dashes, although this is not necessary. The class category
has no semantic significance --- it is purely a tool that helps
organize the presentation of code to the user. In
particular, class categories do not define separate
namespaces.Once a class has been defined, you can use any of the various environment browsing tools (e.g., the Class Browser or the Package Browser) to edit the class. In particular, you can add, change, or remove methods. (See the Guzdial text or the various Squeak tutorials for information on how to navigate in these browsers.)
A method body consists of three parts:
Header line: The header line defines the method name and (optionally) arguments. The syntax differs only slightly for the various kinds of messages. For unary messages, this simply contains the message name. For infix messages, this line contains the operator followed by the argument name:
= aString "The definition of the = infix method for String" ...
For keyword messages, this line contains the keywords, where each keyword is followed by a colon and argument name:
at: index put: aCharacter "The definition of at:put: keyword method for String" ...
As the above examples show, by convention the header line is followed by a comment that documents the method. (This convention is, alas, observed only intermittently in the actual Squeak sources.)
Local declarations: The local declarations are a whitespace-separated sequence of names in between vertical bars:
fooMethod: aValue "A method that declares some locals, but does nothing." | localA localB x y z |
If any local is not used, Squeak's browsers will helpfully ask you whether you want to remove the declaration, but this is not required. Unlike in the Workspace, all names must be declared before use (see below).
Method body: The method body contains a statement. Statements have the following form:
stmt ::= evalStmt | returnStmt | stmtSequence evalStmt ::= expr returnStmt ::= ^ expr stmtSequence ::= stmt*\.
That is, a statement either evaluates an expression, returns the value of an expression to the sender of the message, or executes a sequence of period-separated statements in order. Evaluation and sequences are straightforward and have "obvious" semantics that should be familiar to you. Explicit return statements, however, are novel --- recall that ML functions implicity return the value of the body expression, and Scheme functions return the last body expression), so these languages have no need for explicit return statements.
Returns actually have a fairly interesting semantics in Smalltalk, and merit further discussion.
Syntax quirk: In Squeak, some fonts render the
return symbol ^
as an up-arrow.
Consider the following method in some hypothetical class:
ifZero: aNumber get: aValue "Returns aValue if aNumber is zero." aNumber = 0 ifTrue: [ ^ aValue ]. ^ nil
Consider the return statement, ^ aValue
. This
code is lexically nested inside a closure --- so one might naively
suppose that the value returns a value from that closure. But
that semantics would merely return a value to the enclosing
context --- which is not the semantics we want. Instead,
we want to return this value from the ifZero:get:
method, not from any block inside it.
Return statements therefore return from the lexically
(textually) enclosing method, not the nearest lexically
enclosing block. The ^
return is therefore sometimes
called a "non-local return".
Thought question: How could a programmer implement a non-local
return using Scheme's call/cc
?
The vaiable names accessible in a method belong to the following categories (names in categories earlier in this list shadow names in categories later in this list):
self
argument, which is bound to the receiver)true
)Using any name that cannot resolve to any of the above will be flagged as an error at compile time (in Squeak, compilation occurs when you "accept" a method).
So far, we've discussed objects, classes, and messages without even mentioning inheritance. It is commonly held that the essence of OOP is inheritance. Inheritance in Smalltalk is fairly straightforward:
Object
) without finding an
appropriate method. Then, a MessageNotUnderstood
exception is thrown.self
As previously noted, every message send has a receiver object,
and during evaluation of a method the receiver is bound to
self
(self
is similar to
this
in Java or C++).
Sends to self
are evaluated just like sends to any
other expression: by looking up the method in the value's class
that handles that message, and evaluating it. Notice that the
class of self
can, at runtime, be an instance of
any of the subclasses of the current method's class.
Consider the following method of an abstract Point
class:
distanceFromOrigin "Computes the distance of this point from the origin." ^ ((x * x) + (y * y)) sqrt.
This accesses the x
and y
fields
directly. But suppose we wished to allow subclasses of
Point
to compute the values of
x
and y
, rather than only looking up
field values. In this case, we could replace the field references
with message sends:
distanceFromOrigin "Computes the distance of this point from the origin." ^ ((self x * self x) + (self y * self y)) sqrt.
Now, subclasses of Point can freely redefine the x
and y
methods. For example, one could define either
a PolarPoint
subclass or a RectPoint
subclass:
Point subclass: #PolarPoint instanceVariableNames: 'theta r' ... Point subclass: #RectPoint instanceVariableNames: 'x y' ...
The x
and y
methods of
PolarPoint
could compute the answers to the
x
and y
messages, whereas
RectPoint
could simply look them up in the
fields:
"In Point..." x ^ x y ^ y "In PolarPoint..." x ^ rho * theta cos y ^ rho * theta sin
Self sends are one of the key features of object-oriented languages.
super
(resends)Another key feature of object-oriented languages is the ability
to begin method lookup in a superclass. Consider a subclass of
RectPoint
that implements 3-dimensional points:
RectPoint subclass: #RectPoint3D instanceVariableNames: 'z' ...
Suppose RectPoint
and RectPoint3D
both implement a scaleBy
method:
"In RectPoint...." scaleBy: factor x := self x * factor. y := self y * factor "In RectPoint3D..." z ^ z scaleBy: factor x := self x * factor. y := self y * factor. z := self z * factor
Notice the redundancy in scaleBy
. We'd like to
reuse the code in RectPoint
for its subclasses. But
ordinary sends of scaleBy:
to a
RectPoint3D
instance will always invoke the
subclass's version, not the superclass's version, so there's no
way to access it. The solution is to have a special kind of send
that temporarily changes the rules of method lookup --- a
resend:
"In RectPoint3D..." scaleBy: factor super scaleBy: factor. z := z * factor
The super
send is exactly like a send to
self
, except that lookup begins in the superclass of
the class containing the lexically enclosing method.
super
is a special expression that can
only appear in the receiver position of a message send.
The use of super
only affects the current send, not
any future sends --- so, for example, if the superclass method
performs a self send, then lookup proceeds normally, from the
receiver's actual class, not the superclass of the lexically
enclosing method.
Therefore, consider the following code:
Object subclass: #Foo instanceVariables: '' ... Foo subclass: #Bar instanceVariables: '' ... "In Foo..." baz ^ self bif bif ^ 'hi' "In Bar..." baz ^ super baz bif ^ 'bye'
Evaluating the send (Bar new) baz
will return
'bye'
, not 'hi'
. The super send in
Bar
's baz
invokes Foo
's
baz
, but the self bif
in
Foo
's baz
will begin lookup normally,
which means beginning with the receiver's actual runtime class.
Since the receiver is a Bar
, this self send will
invoke Bar
's bif
, not Foo
's
bif
.
Smalltalk classes have no access protection mechanisms for methods --- anyone can send any message to any object.
However, only methods of an object have access to the object's instance variables, because these variables are only added to the environment when evaluating that object's methods.
Since classes inherit their superclasses' instance variables, subclass instances may access variables defined in a superclass.
In Java/C++ terms, all instance variables are protected, and all methods (member functions) are public.
Hence, in Squeak, where everything is implemented in Smalltalk, you can freely change everything about the world, up to and including the implementation of message sending, subclassing, and closure evaluation. Doing such things will break literally everything in your environment, of course.