For approximately the next 2 weeks, we will study software specifications
As we will see later in a videotape, this is closely related to, but not identical to, software requirements
However, I will combine the two
To the first order (and, again, the videotape will show why this is not fully satisfactory), a specification describes what a program is to do, while a design/implementation describes how a program should achieve that specification
This distinction will be familiar to those of you who took CSE505, programming languages, as a similar distinction that is drawn between functional and logic programming languages vs. imperative programming languages (and the same is probably true for distinctions drawn in some AI work)
Remember that it is the case that
every program satisfies some specification (in fact, an unbounded number of specifications)
and (almost) every specification is satisfied by an unbounded number of programs
The set of programs that solve the halting problem is empty, for instance
We all have our favorite examples of difficulties in specifications
It's a bad idea to instruct the janitor by leaving the following note on your whiteboard: "Erase board and clean floor."
It's a bad idea to give two programming assignments in the same homework and say in each, "Use YFPL."
Here are some other examples from software (some due to Ghezzi, Jazayeri, and Mandrioli)
"Selecting is the process for designating areas of your document that you want to work on." [Microsoft Word 4.0 manual]
What is an area? Contiguous? Across lines? What if you have other objects inserted into your document? (This happens to me all the time!)
"The message must be triplicated. The three copies must forwarded through three different physical channels. The receiver accepts the message on the basis of a two-out-of-three voting policy." [From a real mission-critical project.]
Is the message considered received as soon as two identical copies are received, or must the receiver wait for the third?
It turns out the implementation wasn't either of these. Instead, the receiver polled the channels and a time-out was used; if the time-out occurred on the third channel after two identical messages were received, the message was accepted.
Consider the following postcondition for sorting an array A of length N in non-decreasing order:
For-all (i,j) in [1..N] i<j implies A[i]<=A[j]
For i = 1 to N do
A[i] = i;
Is a program that satisfies this specification just fine
Formalism alone won't solve all specification problems
"Building the system right vs. building the right system."
Consider the following well-known and simple specification due to Naur:
Given a text consisting of words separated by BLANKS or by NL (new line) characters, convert it to a line-by-line form in accordance with the following rules:
line breaks must be made only where the given text has BLANK or NL;
each line is filled as far as possible, as long as
no line will contain more than MAXPOS characters.
Do you see any problems with this specification?
If not, you haven't TAed 142/143 yet!
Here is a version of Naur's specification written by Goodenough and Gerhart:
The program's input is a stream of characters whose end is signaled with a special end-of-text character, ET. There is exactly one ET character in each input stream. Characters are classified as
break characters - BL (blank) and NL (new line);
nonbreak characters - all others except ET;
the end-of-text indicator - ET.
A word is a nonempty sequence of nonbreak characters. A break is a sequence of one or more break characters. Thus, the input can be viewed as a sequence of words separated by breaks, with possibly leading and trailing blanks, and ending with ET.
The program's output should be the same sequence of words as in the input, with the exception that an oversize word (i.e. a word containing more than MAXPOS characters, where MAXPOS is a positive integer) should cause an error exit from the program (i.e. a variable, Alarm, should have the value TRUE). Up to the point of error, the program's output should have the following properties:
A new line should start only between words and at the beginning of the output text, if any.
A break in the input is reduced to a single break character in the output.
As many words as possible should be placed on each line (i.e. between successive NL characters).
No line may contain more than MAXPOS characters (words and BLs).
Goodenough and Gerhart wrote this, in part, because programs written to Naur's specification had errors, some of which they argued were due to problems with the specification
Their specification had itself gone through several revisions
In groups of 3 or so, find as many problems as you can with the Goodenough and Gerhart specification
Bertrand Meyer identifies "Seven Sins of the Specifier"
Noise (information that repeats other information already available)
This might be valuable (like comments), but it can also be distracting and confusing
Ex: "Thus, the input can be viewed as…ending with ET." It's a repetition of earlier material; it may or may not be valuable to have it repeated in another form.
Remorse (a variant of noise in which a restriction is made later to something defined earlier)
Ex: "of the output text, if any" implies that output text may not exist, although that was surely not true earlier in the specification
Silence (omitted information)
Ex: The remorseful parenthetical about what a line is. "(i.e., between successive NL characters)" hints at the silence about defining a line precisely.
Even this definition is a problem, since it is unclear about whether NL's are part of a line
In addition, although much of the world is oriented towards file representations using explicit NL characters, the whole world does not work that way: IBM operating systems of yore (which are still prevalent, by the way) use record-oriented representations instead, so there is usually no explicit NL character, making the specification somewhat platform-specific
Ex: The specification describes when Alarm should be set to TRUE, but doesn't address what its value is in other situations
Contradiction (conflicting definitions within the same specification)
Ex: In this specification, the problems with contradictions tend to be based on confusions about the types of information
The input is a "stream of characters" that can be "viewed as a sequence of words"
The output should be the same "sequence of words as in the input", but the output should have a "new line [that] should start only between words…"
Overspecification (too much about the solution rather than about the problem)
Ex: Why should a failure be required to report via a specific variable (what about languages with exceptions, for instance)?
What if this computation is packaged as a process, not a procedure?
Ex: The notion of "ET" is largely due to the implementation platform used by Naur, who implemented in Algol 60, which did not provide for end-of-file detection when reading characters (and thus needed a terminating sentinel)
Perhaps even more strange, note that the output from this program does not have an ET character! So in a Unix-like pipe-and-filter world, you couldn't take the output of this program and feed it to itself!
Ambiguity
Ex: What, precisely, is the "point of an error"?
Is it the end of the last acceptable word, or is it the character at which the input fails to meet the constraints?
Sometimes ambiguity is OK, intended by the specifier
Ex: If MAXPOS=10, how is "Who What When" formatted? There are two legitimate choices that may or may not have been intentionally permitted by the specifier
Forward references
These may or may not be bad, but they don't always make it easier to read
Ex: ET is used three times before it is defined
Ex: Line is used before it was defined
Ex: MAXPOS is used before it was defined
Of course, Meyer provides a better specification than this, arguing strongly for the use of mathematical formalisms
But I won't present this today; we'll see several like this, although perhaps not this one
Formal specifications have several interesting properties
They can play at least two important and distinct roles
They can provide a basis on which to do verification, ensuring that an implementation indeed satisfies a specification (you "build the system right")
This is largely the basis for much of the work in program verification, which has not really succeeded to any significant degree
They can be useful in increasing the confidence one has in the consistency and completeness of a specification
For example, consider the specification of an abstract data type: one can formally check that the specification of each operation on the ADT guarantees that invariants on the data itself are satisfied
There are also those who argue for "executable specifications"---that is, specs that can directly executed or easily translated to executable programs
This may work in some situations, but they are limited because there is a big gap between either
the specification and the program (things that are easy to say in a specification are not necessarily easy to say in programs, especially efficiently --- negation and conjunction are simple examples)
or else the user and the specification (reducing the utility of the specification for helping to ensure that you "build the right system")
We'll see some material on specifications based on logic/set theory, on algebra, on finite state machines