Details / Project Description

----------------------------------------------------------------------

You will make the following modifications to the PL/0 language, by modifying the original PL/0 compiler. This will happen in stages: lexing, parsing, semantic analysis, and one or more stages at the back-end. Start with the original lexical and BNF description.

Comments
Extend the PL/0 language to allow comments. A comment begins with a pound sign ("#") and extends to the end of the line. Comments are treated as whitespace (not tokens). The following code illustrates some comments:
    i := x + y; # this is a comment
    j := a - b; # this computes "a + (-b)"
    # a comment; still in the comment # and more comment

Underscores in Identifiers
Extend the PL/0 language to allows underscores ("_") to appear anywhere that a letter can appear in an identifier. Underscores are significant. For example, the following are seven distinct, legal identifiers:
    foo1  _foo1  f_o__o1  foo_1__  _  __  ___1

If-Then-Else Statements
Replace the if statement with an if-then-else statement with the following syntax:
    <if stmt> ::= if <test> then <stmt list-1> [else <stmt list-2>] end
The semantics are as expected: if the <test> evaluates to true, execute the statements in <stmt list-1>; otherwise, execute the statements in <stmt list-2>, if any. (Note: the -1 and -2 do not represent new syntactic entitites, but rather distinguish two entities that have the same syntax.)

For Loops
Add a for loop to the language by augmenting the <stmt> production and by adding the production:
    <for stmt> ::= for <id> := <expr-1> to <expr-2> do <stmt list> end

The semantics of the for statement are identical to those of the following while statement:

    <id> := <expr-1>
    while <id> <= <expr-2> do
        <stmt list>
        <id> := <id> + 1;
    end
Note that this means that expr-1 is evaluated only once before entering the loop, and expr-2 is evaluated at each iteration. Also, the loop variable must be declared as an integer variable. It is legal for the initial expression to be greater than the final expression; in this case, the body of the loop is never executed. Assignments to the loop index variable within the loop body are allowed.

Break Statements
The extended language supports the break statement. The break statement has the effect of terminating the immediately enclosing while or for loop. It is an error for a break statement to appear outside a loop. (It need not be directly inside a loop; e.g. it is legal inside an if that is itself inside a loop.)

Booleans
Add first-class Boolean types to the language. This means that the PL/0 programmer can declare variables to be Booleans, using them in essentially all the ways in which integers (the only previously available first-class type) can be used, except A type, bool, is supported as the type of these variables. Additionally, test expressions in if and while statements must be Boolean, not integer, and the result of the odd operator is of Boolean type, not integer. (The expression odd x is true if and only if the value of x is an odd integer.)

Two new Boolean constants are supported, true and false. In addition, two Boolean operators are introduced, and and or. Both operators take Boolean expressions as arguments and return a Boolean value. Both are short-circuiting, meaning that the second (rightmost) argument is not evaluated if the result of the first argument fully determines the result. They are both left-associative, and and has higher precedence than or; but and has lower precedence that all other operators. Integer operators do not apply to Booleans and vice versa.

Examples:

    var a:int, x:int;
    var b:bool;
    if a = 0 or x / a > 3 and b then <stmt list> end;
    # test parses as "(a = 0) or (((x / a) > 3) and b)"
    # if a = 0, then don't evaluate the rest of the test
    a := b;             # illegal
    a := 3 * b;         # illegal
    while 1 do ... end; # illegal

Constant Expressions
Named scalar (non-array) constants can be initialized with arbitrary integer and Boolean expressions involving integer literals, Boolean literals, and previously declared named constants. Expressions may not involve procedure parameters, function calls, nor the input primitive. The following examples illustrate some legal constant declarations:
    const a:int = 3 + -4;
    const b:int = 5 * a;
    const c:bool = a > b or odd b;

Arrays
The extended language supports arrays. The type of an array is described as:
    array [  <const expr>  ] of <type>
where the constant expression is constrained to evaluate to a positive (greater than 0) integer, indicating the size of the array for the given type. (Square brackets here and in the definition of array references below are required input tokens, not the meta-notation meaning "optional, " as in the rest of this document.) The type of the elements can be any type, including another array. For example, the following are some legal array variable declarations:
    var a:array [10] of int;
    var b:array [25] of array [100] of bool;
    const size: int = 5 * 100;
    var c:array [size] of array [size + 1] of int;

In the extended language, the following notation references an array element:

    <lvalue> [ <expr> ]
where the <lvalue> production has been modified to:
    <lvalue> ::= IDENT | <lvalue> [ <expr> ]
The <lvalue> should evaluate to some array a, of size n, and the index expression should evaluate to some integer i. This expression returns the i-th element of the array a, where the first element of the array is at index 0. It is an unchecked run-time error for i to be outside the range [0..n-1]. The <lvalue> may be another array indexing operation, in the case of accessing a multi-dimensional array. Assignment of whole arrays or array slices is disallowed. Use of whole arrays or array slices in input and output statements is also forbidden. Examples:
    var a:array [5] of array [3] of int;
    var b:array [3] of int;
    a[1][i+j] := b[k-1] * 2;
    a[1] := b;      # This is illegal
    b := input;     # So is this
Call-By-Reference Parameters
In the extended language, a formal parameter can be designated as being passed by reference by prefixing it with the var keyword (much as in Pascal). The following is an example of such a procedure:
    procedure foo(var x: int, y: int, var z: array [10] of int);
    begin
        x := x+y;
        y := y+x;
        z[5] := x*y;
    end foo;
The caller of a procedure containing a call-by-reference formal parameter must use an <lvalue> (of compatible type) as the corresponding actual parameter to the procedure; other expressions as actuals to a call-by-reference formal are illegal. Examples:
    var a: int, b : array [10] of int;
    const c: int = 5;
    var d: array [5] of int;

    foo(a,17,b);            #legal
    foo(b[5],a+b[7],b);     #legal
    foo(c,17,b);            #illegal
    foo(17,a,b);            #illegal
    foo(a+b[7],20,b);       #illegal 
    foo(a,17,d);            #illegal
Any formal can be assigned within the body of a procedure. However, only if the formal is call-by-reference will the assignment affect the caller's variables.

Arguments of scalar types can be passed by either value or reference. Arrays must be passed by reference, and as a result var must precede arrays which are declared as formal parameters. The type and size of the array being passed must match the type and size of the array formal.

Functions
A procedure may optionally return a result; such a value-returning procedure is called a function. To indicate that a procedure returns a result, a result type declaration is added after the list of formal parameters in the procedure declaration. (This is optional, since procedures are still supported as well as functions.)

The following are legal function declarations:

    procedure foo(x:int, y:bool): int;
    begin
       <stmt list>
    end foo;

    procedure bar(): bool;
    var x:int;
    begin
       <stmt list>
    end bar;
Function calls can be used anywhere that an expression of the appropriate type can be used; syntactically, function call expressions have higher precedence than other operators. Functions may also be called like procedures, as a separate statement. For example, given the above function declarations, the following is a legal statement:
    foo(foo(3,bar()), i < foo(x+y,bar()));
The type of the result of a function must be a scalar type; whole arrays cannot be returned. The result is returned by value.

Return Statements
The return statement can be used to exit a procedure, immediately transferring control to the end of the procedure body. It is legal for a procedure (but not a function) to return by "falling off" the procedure body without encountering an explicit return statement.

To return a value from a function, the return statement is used with an expression. Both forms are captured by

    <return stmt> ::= return [ <expr> ]
The type of the expression returned must match the type declared for the enclosing function. It is illegal for a value-returning return statement to appear in a procedure not declared to return a result. It is similarly illegal for a non-value-returning return statement to appear in a function. It is an error if it is possible for execution to reach the end of a function's body without executing a return statement. In particular, a function whose only return statement is in a then clause of an if statement is invalid, since if the then clause is not taken the function will not return anything.

The following illustrates some legal uses of return statements:

    procedure foo(x:int,y:bool):int;
    begin
       if y then return x; end;
       return x * 5;
    end foo;
Known Problems
The base PL/0 compiler contains a few "features" that you might want to be aware of. You do not need to fix any of these.