v For approximately the next 2 weeks, we will
study software specifications
Ø As we will see later in a videotape, this
is closely related to, but not identical to, software requirements
Ø However, I will combine the two
v To the first order (and, again, the
videotape will show why this is not fully satisfactory), a specification
describes what a program is to do,
while a design/implementation describes how
a program should achieve that specification
Ø This distinction will be familiar to those
of you who took CSE505, programming languages, as a similar distinction that is
drawn between functional and logic programming languages vs. imperative
programming languages (and the same is probably true for distinctions drawn in
some AI work)
v Remember that it is the case that
Ø every program satisfies some specification
(in fact, an unbounded number of specifications)
Ø and (almost) every specification is
satisfied by an unbounded number of programs
§
The set of
programs that solve the halting problem is empty, for instance
v We all have our favorite examples of
difficulties in specifications
Ø It's a bad idea to instruct the janitor by
leaving the following note on your whiteboard: "Erase board and clean
floor."
Ø It's a bad idea to give two programming
assignments in the same homework and say in each, "Use YFPL."
v Here are some other examples from software
(some due to Ghezzi, Jazayeri, and Mandrioli)
Ø "Selecting is the process for
designating areas of your document that you want to work on." [Microsoft
Word 4.0 manual]
§
What is an
area? Contiguous? Across lines? What if you have other objects inserted into your document? (This happens to me all the time!)
Ø "The message must be triplicated. The three copies must forwarded through
three different physical channels. The
receiver accepts the message on the basis of a two-out-of-three voting
policy." [From a real mission-critical project.]
§
Is the
message considered received as soon as two identical copies are received, or
must the receiver wait for the third?
·
It turns out
the implementation wasn't either of these.
Instead, the receiver polled the channels and a time-out was used; if
the time-out occurred on the third channel after two identical messages were
received, the message was accepted.
Ø Consider the following postcondition for
sorting an array A of length N in non-decreasing order:
§
For-all (i,j) in [1..N] i<j implies A[i]<=A[j]
·
For i = 1 to N do
A[i] = i;
·
Is a program
that satisfies this specification just fine
¨ Formalism alone won't solve all
specification problems
Ø "Building the system right vs.
building the right system."
v Consider the following well-known and
simple specification due to Naur:
Ø Given a text consisting of words separated
by BLANKS or by NL (new line) characters, convert it to a line-by-line form in
accordance with the following rules:
§
line breaks
must be made only where the given text has BLANK or NL;
§
each line is
filled as far as possible, as long as
§
no line will
contain more than MAXPOS characters.
Ø Do you see any problems with this
specification?
§
If not, you
haven't TAed 142/143 yet!
v Here is a version of Naur's specification
written by Goodenough and Gerhart:
Ø The program's input is a stream of
characters whose end is signaled with a special end-of-text character, ET.
There is exactly one ET character in each input stream. Characters are
classified as
·
break
characters - BL (blank) and NL (new line);
·
nonbreak
characters - all others except ET;
·
the
end-of-text indicator - ET.
Ø A word is a nonempty sequence of nonbreak
characters. A break is a sequence of one or more break characters. Thus, the
input can be viewed as a sequence of words separated by breaks, with possibly
leading and trailing blanks, and ending with ET.
The program's output should be the
same sequence of words as in the input, with the exception that an oversize
word (i.e. a word containing more than MAXPOS characters, where MAXPOS is a
positive integer) should cause an error exit from the program (i.e. a variable,
Alarm, should have the value TRUE). Up to the point of error, the program's
output should have the following properties:
1. A new line should start only between words
and at the beginning of the output text, if any.
2. A break in the input is reduced to a single
break character in the output.
3. As many words as possible should be placed
on each line (i.e. between successive NL characters).
4. No line may contain more than MAXPOS
characters (words and BLs).
v
Goodenough
and Gerhart wrote this, in part, because programs written to Naur's
specification had errors, some of which they argued were due to problems with
the specification
Ø
Their
specification had itself gone through several revisions
Ø
In
groups of 3 or so, find as many problems as you can with the Goodenough and
Gerhart specification
v
Bertrand
Meyer identifies "Seven Sins of the Specifier"
Ø
Noise
(information that repeats other information already available)
§
This might be
valuable (like comments), but it can also be distracting and confusing
§
Ex:
"Thus, the input can be viewed as…ending with ET." It's a repetition of earlier material; it
may or may not be valuable to have it repeated in another form.
Ø
Remorse (a
variant of noise in which a restriction is made later to something defined
earlier)
§
Ex: "of
the output text, if any" implies that output text may not exist, although
that was surely not true earlier in the specification
Ø
Silence
(omitted information)
§
Ex: The
remorseful parenthetical about what a line is. "(i.e., between successive
NL characters)" hints at the silence about defining a line precisely.
·
Even this
definition is a problem, since it is unclear about whether NL's are part of a
line
·
In addition,
although much of the world is oriented towards file representations using
explicit NL characters, the whole world does not work that way: IBM operating
systems of yore (which are still prevalent, by the way) use record-oriented
representations instead, so there is usually no explicit NL character, making
the specification somewhat platform-specific
§
Ex: The
specification describes when Alarm should be set to TRUE, but doesn't address
what its value is in other situations
Ø
Contradiction
(conflicting definitions within the same specification)
§
Ex: In this
specification, the problems with contradictions tend to be based on confusions
about the types of information
·
The input is
a "stream of characters" that can be "viewed as a sequence of
words"
·
The output
should be the same "sequence of words as in the input", but the
output should have a "new line [that] should start only between
words…"
Ø
Overspecification
(too much about the solution rather than about the problem)
§
Ex: Why
should a failure be required to report via a specific variable (what about
languages with exceptions, for instance)?
·
What if this
computation is packaged as a process, not a procedure?
§
Ex: The
notion of "ET" is largely due to the implementation platform used by
Naur, who implemented in Algol 60, which did not provide for end-of-file
detection when reading characters (and thus needed a terminating sentinel)
·
Perhaps even
more strange, note that the output from this program does not have an ET
character! So in a Unix-like
pipe-and-filter world, you couldn't take the output of this program and feed it
to itself!
Ø
Ambiguity
§
Ex: What,
precisely, is the "point of an error"?
·
Is it the end
of the last acceptable word, or is it the character at which the input fails to
meet the constraints?
§
Sometimes
ambiguity is OK, intended by the specifier
·
Ex: If
MAXPOS=10, how is "Who What When" formatted? There are two legitimate choices that may or
may not have been intentionally permitted by the specifier
Ø
Forward
references
§
These may or
may not be bad, but they don't always make it easier to read
§
Ex: ET is
used three times before it is defined
§
Ex: Line is
used before it was defined
§
Ex: MAXPOS is
used before it was defined
v
Of course,
Meyer provides a better specification than this, arguing strongly for the use
of mathematical formalisms
Ø
But I won't
present this today; we'll see several like this, although perhaps not this one
v
Formal specifications
have several interesting properties
Ø
They can play
at least two important and distinct roles
§
They can
provide a basis on which to do verification, ensuring that an implementation
indeed satisfies a specification (you "build the system right")
·
This is
largely the basis for much of the work in program verification, which has not
really succeeded to any significant degree
§
They can be
useful in increasing the confidence one has in the consistency and completeness
of a specification
·
For example,
consider the specification of an abstract data type: one can formally check
that the specification of each operation on the ADT guarantees that invariants
on the data itself are satisfied
§
There are
also those who argue for "executable specifications"---that is, specs
that can directly executed or easily translated to executable programs
·
This may work
in some situations, but they are limited because there is a big gap between
either
¨
the
specification and the program (things that are easy to say in a specification
are not necessarily easy to say in programs, especially efficiently ---
negation and conjunction are simple examples)
¨
or else the
user and the specification (reducing the utility of the specification for
helping to ensure that you "build the right system")
Ø
We'll see
some material on specifications based on logic/set theory, on algebra, on
finite state machines