CSE 401 Autumn 2001 (Henry): Compilers

Details / Project Description

($Revision: 1.15 $)

You should make all of the following modifications to the PL/0 language, by modifying the original PL/0 compiler. This will happen in stages: lexing, parsing, semantic analysis, and one or more stages at the back-end. Start with the original lexical and BNF description.

The number of points for each modification roughly reflects the amount of effort that I expect it will take to implement the modification.

Comments (10 points)

Extend the PL/0 language to allow comments. A comment begins with two adjacent slash signs ("/") and extends to the end of the line. These comments are like those accepted by C++. Comments are treated as whitespace (not tokens). The following code illustrates some comments:

    i := x + y; // this is a comment
    j := a - b; // this computes "a + (-b)"
    // a comment; still in the comment // and more comment

If-Then-Else Statements (10 points)

Replace the if statement with an if-then-else statement with the following syntax:

    <if stmt> ::= if <test> then <stmt list-1> [else <stmt list-2>] end

The semantics are as expected: if the <test> evaluates to true, execute the statements in <stmt list-1>; otherwise, execute the statements in <stmt list-2>, if any. (Note: the -1 and -2 do not represent new syntactic entitites, but rather distinguish two entities that have the same syntax.)

Richer First Class Typing (0 points)

Add the new first class type of "interval" to the language. Add the new first class type of "boolean" (note that it spelled without the leading upper case ``B'') to the language. These types will be described in detail in the next sections.

These extensions mean that the PL/0 programmer can declare variables or formal parameters to be either integers, intervals or booleans, and use them in essentially all the ways in which integers (the only previously available first-class type) can be used, except:

only integers are allowed in input and output statements;
strong static typing is to be enforced at all times;
as otherwise specified in the following sections.

There are no points for this extension, since you actually will do the work for this extension in other extensions.

Booleans (15 points)

A value of type Boolean is computed by the 6 existing relational operators (such as "="), or by the three new infix binary operators and, or and in, or by the literal constants true and false. The Boolean literals and operators are described in the next paragraphs. Booleans are neither implicitly nor explicitly convertable to or from values of type integer or interval; Booleans can only be used as operands to and, or as the Test part of an if or while statement, assigned to variables of type Boolean, passed as actual parameters to formal parameters of type Boolean, or returned from functions of type Boolean.

Add the literal constants true and false of type Boolean to the language.

Test Operators (10 points)

Add two new left associative binary operators and and or. The and operator has higher precedence than or, but and has lower precedence than all other operators. Both operators take Boolean values as operands and return a new Boolean value, using the obvious semantics. Both operators are strict, in that they will evaluate both operands.

Here are some examples of Booleans in legal and illegal uses:

  var P,Q: interval;
  P := false or 1>0;			// P is assigned to
  P := predicate(17, true, 0=1);	// calls function predicate
  output := false;			// illegal
  if (P) then output := 1;		// legal

Intervals (35 points)

An interval consists of a pair of integers. The first member of the pair is called the lower bound and is always less than or equal to the second member of the pair which is called the upper bound. An interval represents a value that is somewhere between the lower bound and upper bound, inclusively. Intervals can be used to implement an algebra of uncertainties.

A value of type interval is constructed with a call to an intrinsic function named interval in an expression using the syntax:

  interval ( < expr0 > [ , < expr1 > ] ; )

This constructor looks like a call to the function named interval, with two or one arguments. (The expr1 is optional.) If two arguments are given, then the inverval's lower bound is the value of expr0 and the upper bound is the value of expr1. If only one argument is given, then the interval's lower bound and upper bound have the same value given by expr0. An error shall be flagged, either at compile time or at run time, if the interval is constructed such that the given lower bound is greater than the given upper bound.

Values of type integer may be freely promoted to values of type interval, as if the interval constructor with one argument were used.

Values of type interval may not be freely demoted to values of type integer. Instead, use the intrinsic function of one argument lb(ub) to return the lower(upper) bound from the interval argument.

Here are some examples summarizing declarations and conversions:

  var X,Y: interval;
  var A: integer;
  ...
  A := 17;
  X := A;		// equivalent to X := interval(A)
  X := interval(2, 3);
  A := X;		// illegal, since this is an implicit demotion
  A := lb(X);		// legal, A now has value 2
  A := ub(A);		// equivalent to A := ub(interval(A))

The unary and binary operators that currently work on integer values are also overloaded to combine interval values. If the arithmetic operator (such as +) yields an integer when given integer(s), then the same operator yields intervals when given intervals. If the relational operator (such as =) yields a Boolean when given integer(s), then the same operator also yields a Boolean when given intervals.

Operators on interval values always work to pessimize the interval, namely to make the resulting interval as wide as possible, or, put another way, to maximize the uncertainty of a value.

Continuing the example:

  interval(2,3) + interval(5,10)	// evaluates to (7, 13)
  interval(10,20) / interval(1,2)	// evaluates to (5, 20)
  interval(10,20) * -1			// evaluates to (-20, -10)

  interval(10,20) <  interval(20,30)	// evaluates to false
  interval(10,20) <= interval(20,30)	// evaluates to true
  interval(10,22) <  interval(20,30)	// evaluates to false
  interval(21,22) <  interval(20,30)	// evaluates to false

The new binary infix operator in takes two interval values, and checks to see if the left operand is embedded within the right operand, and returns a Boolean:

  interval(10,20) in interval(0,30)	// evaluates to true
  interval(10,30) in interval(0,30)	// evaluates to true
  interval(10,31) in interval(0,30)	// evaluates to false
  interval(-1,30) in interval(0,30)	// evaluates to false

An error shall be flagged, either at compile time or at run time, if an interval division operator is given a denominator that is an interval that contains 0.

Functions (20 points)

A procedure may optionally return a result; such a value-returning procedure is called a function. To indicate that a procedure returns a result, a result type declaration is added after the list of formal parameters in the procedure declaration. (The result type declaration is optional, since procedures are still supported as well as functions.)

For example, the following are legal function declarations:

    procedure foo(x:interval, y:integer): integer;
    begin
       <stmt list>
    end foo;

    procedure bar(): interval;
    var x:int;
    begin
       <stmt list>
    end bar;

Function calls can be used anywhere that an expression of the appropriate type can be used. Syntactically, function calls (and calls to intrinsic functions) are sub-expressions that have higher precedence than other operators. Functions may also be called like procedures, as a separate statement; in this case the value returned from the function is discarded. The type of the result of a function can be any first class type. The result is returned by value.

Return Statements (25 points)

The return statement can be used to exit a procedure, immediately transferring control to the end of the procedure body. It is legal for a procedure (but not a function) to return by "falling off" the procedure body without encountering an explicit return statement.

To return a value from a function, the return statement is used with an expression. Both forms are captured by

    <return stmt> ::= return [ <expr> ]

The type of the expression returned must be freely promotable to the type declared for the enclosing function. It is illegal for a value-returning return statement to appear in a procedure not declared to return a result. Similarly, it is illegal for a non-value-returning return statement to appear in a function. It is an error if it is possible for execution to reach the end of a function's body without executing a return statement. In particular, a function whose only return statement is in a then clause of an if statement is invalid, since if the then clause is not taken then the function will not return anything.

The following illustrates some legal uses of return statements:

procedure foo(x:int,y:interval):interval;
begin
  if x>10 then return x; end;
  if x< 5 then return y; end;
  return y * 5;
end foo;

Empty Statements (5 points)

Add empty statements to the language. An empty statement consists of nothing, and does nothing when executed.

The practical impact of empty statements is that `;' can be used as both separators and terminators.

Throw/Catch Statements (35 points)

The extended language is to support a throw statement. The pseudo syntax is:

    <throw stmt> ::= throw <Expr>

The Expr must be of type integer. Execution of the throw statement evaluates the Expr and then "throws" that value, where the thrown value is examined by every dynamically enclosing catch statement (see below) until a catch statement recognizes that value as one it has been set up to handle. If no dynamically enclosing catch statement has been set up to handle the throw, then the entire program is to terminate with a non-zero exit code.

A throw may force procedures (and functions) to return prematurely. A throw can be thought of as a different kind of return; unlike return statements which merely return from a single procedure call, a throw statement need not necessarily return from the procedure (if the throw statement is statically embedded in a catch that handes the throw), nor it need not return from exactly one procedure.

The extended language is to support a catch statement. The pseudo syntax is:

    <catch stmt> ::= catch [ ( <Expr> { , <Expr> } ) ] within
      <statement list>
    [ handler <statement list > ]
    end

The Expr in the optional expression list must be of type integer. The <statement list> between the in and the handler keywords is the body of the catch statement. The <statement list> between the handler and the end keywords is the handler of the catch statement. This part is optional.

Execution of the catch statement proceeds as:

The optional list of expressions after the catch statement are evaluated prior to executing the body of the catch. These expressions must be of type integer.
The body of the catch is always executed.
If a value of type integer is thrown by executing a throw statement at any time during the course of executing the body of the catch, and either (1) the expression list was empty, or (2) the thrown value equals the value of any expressions evaluated in part 1, then the throw that initiated the handler is considered to have finished, and the handler part of the catch statement is executed, if any.

Note that these rules imply that throw statements executed while evaluating the expression list, or while evaluating the handler, are not processed by the catch statement itself, but will be processed by the NEXT dynamically enclosing catch statement.

Here are some examples of catch and throw:

  catch (15+2) within
    throw 17
  handler
    // handler for value 17, which is directly thrown
  end

  catch (15+2) within
    catch (15+1) within
	throw 17
    handler
      // this code is never executed ("throw 17" not caught by "catch 16")
    end
  handler
    // handler for value 17, which is thrown by ignored by innermost catch
  end

  catch (15+2) within
      throw 17
  handler
    throw 17	// throws to some OTHER dynamically enclosing catch statement
  end

401admin at cs.washington.edu