handout #15

CSE341—Programming Languages

Programming Assignment #7 (Scheme)

due 11 pm Wednesday, 5/20/09

In this assignment you will write a set of procedures for parsing various token sequences that appear in programs written in BASIC.  We will be using an approach known as recursive descent parsing in which we define a series of procedures based on the underlying grammar with one procedure for each production from the grammar.  The grammar we will be using can be described using the following EBNF productions:

<test> ::= <expression> ("<" | ">" | "<=" | ">=" | "=" | "<>") <expression>

<expression> ::= <term> {("+" | "-") <term>}

<term> ::= <element> {("*" | "/") <element>}

<element> ::= <factor> {"^" <factor>}

<factor> ::= <number> | - <number> | ( <expression> ) | <f> ( <expression> )

<f> ::= SIN | COS | TAN | ATN | EXP | ABS | LOG | SQR | RND | INT

In EBNF the pipe character (“|”) indicates “or” and curly braces indicate “0 or more of.”  For example, the production for <expression> indicates that it is composed of a <term> followed by 0 or more “+ <term>” or “- <term>”.  So it would match sequences like this:

<term> + <term> + <term> - <term> + <term> - <term> - <term>

Of course, like <sentence>, <term> is also a nonterminal, so it would be filled in with appropriate terminal symbols.

You should define a procedure to parse each of the five major elements described above.  The sixth rule is slightly different (described below).  Your procedures will take a list of tokens as an argument.  It should consume tokens from the list and replace those tokens with the value obtained by evaluating the tokens.  For example, parse-term evaluates the multiplicative operators * and /.  If it is passed the list (2.5 * 3.4 / 2 THEN 210), it will evaluate 2.5 * 3.4 / 2, obtain 4.25 and replace the tokens it parsed with 4.25, returning the list (4.25 THEN 210).  These procedures will be “greedy” in the sense that they will consume as many tokens from the front of the list as they can that can be part of the given grammar element.  The token “THEN” can’t be part of a term, so the procedure stops consuming tokens when it encounters it.  Even though it is greedy, it should pay attention only to tokens that appear at the front of the list.

Parentheses are difficult to handle as symbols, so the lists you will be processing use the tokens lparen and rparen for left parenthesis and right parenthesis.

The sixth rule of the grammar indicates a list of legal function names.  Because it doesn’t mention any other grammar elements, it doesn’t need its own parsing procedure.  You are welcome to define a helper procedure that processes it, but it’s not a requirement.  In evaluating expressions involving these functions, you will need to translate the named function into a Scheme procedure.  The skeleton file for the assignment includes the following association list that will help you to accomplish this task:

 (define functions

  '((SIN . sin) (COS . cos) (TAN . tan) (ATN . atan) (EXP . exp) (ABS . abs)

    (LOG . log) (SQR . sqrt) (RND . random) (INT . truncate)))

Each element of the association list is a dotted pair where the car is a key and the cdr is a value (like a map in Java).  Scheme has a standard procedure called assoc that will search for a particular key.  For example, given the list above, (assoc ‘ATN functions) returns (ATN . atan).  To get the “atan” symbol from this, you’d take the cdr of it.  Because it’s a dotted pair and not a list, you use cdr instead of cadr.  If the key is not found, as in the call (assoc ‘FOO functions), you get #f as a result.

You are to write the following parsing procedures:

1.      (30 points) Write a procedure parse-factor that parses a factor at the front of a list, replacing the tokens that were part of the factor with the numeric value of the factor.  As indicated in the grammar above, a factor is either a simple number or a minus followed by a number (the negation of a number) or a parenthesized expression or a parenthesized call on a function.  For example:

> (parse-factor '(3.5 2.9 OTHER STUFF))

(3.5 2.9 OTHER STUFF)

> (parse-factor '(- 7.9 3.4 * 7.2))

(-7.9 3.4 * 7.2)

> (parse-factor '(lparen 7.3 - 3.4 rparen + 3.4))

(3.9 + 3.4)

> (parse-factor '(SQR lparen 12 + 3 * 6 - 5 rparen))

(5)

2.      (15 points) Write a procedure parse-element that parses an element at the front of a list, replacing the tokens that were part of the element with the numeric value of the element.  As indicated in the grammar above, an element is a series of factors separated by the token ^ which indicates exponentiation.  Your procedure should consume as many tokens as possible that could be part of an element.  The operators should be evaluated left-to-right.  You can call the standard expt procedure to compute the exponent.  For example:

> (parse-element '(2 ^ 2 ^ 3 THEN 450))

(64 THEN 450)

> (parse-element '(2.3 ^ 4.5 * 7.3))

(42.43998894277659 * 7.3)

> (parse-element '(7.4 + 2.3))

(7.4 + 2.3)

3.      (15 points) Write a procedure parse-term that parses a term at the front of a list, replacing the tokens that were part of the term with the numeric value of the term.  As indicated in the grammar above, a term is a series of elements separated by the tokens * and / (multiplication and division).  Your procedure should consume as many tokens as possible that could be part of a term.  The operators should be evaluated left-to-right.  Scheme will potentially returning a rational when we use division, which we don’t want.  So for each division, return the value obtained by passing the result to the standard Scheme procedure exact->inexact, as in (exact->inexact 7/8), which returns 0.875.  For example:

> (parse-term '(2.5 * 4 + 9.8))

(10.0 + 9.8)

> (parse-term '(38.7 / 2 / 3 THEN 210))

(6.45 THEN 210)

> (parse-term '(7.4 * lparen 2.4 - 3.8 rparen / 4 - 8.7))

(-2.59 - 8.7)

> (parse-term '(3 / 4 + 9.7))

(0.75 + 9.7)

4.      (15 points) Write a procedure parse-expression that parses an expression at the front of a list, replacing the tokens that were part of the expression with the numeric value of the expression.  As indicated in the grammar above, an expression is a series of terms separated by the tokens + and – (addition and subtraction).  Your procedure should consume as many tokens as possible that could be part of an expression.  The operators should be evaluated left-to-right.  For example:

> (parse-expression '(12.4 - 7.8 * 3.5 THEN 40))

(-14.9 THEN 40)

> (parse-expression '(2 + 3.4 - 7.9 <= 7.4))

(-2.5 <= 7.4)

> (parse-expression '(3 * 4 ^ 2 / 5 + SIN lparen 2 rparen))

(10.50929742682568)

5.      (25 points) Write a procedure parse-test that parses a test at the front of the list, replacing the tokens that were part of the test with the boolean value of the test (#t or #f).  As indicated in the grammar above, a test is an expression followed by a relational operator followed by an expression.  For example:

> (parse-test '(3.4 < 7.8 THEN 19))

(#t THEN 19)

> (parse-test '(2.3 - 4.7 ^ 2.4 <> SQR lparen 8 - 4.2 ^ 2 * 9 rparen FOO))

(#t FOO)

> (parse-test '(2 ^ 4 = 4 ^ 2))

(#t)

In writing your parse routines, you should test for potential errors and should call the error procedure if an error is encountered.  The call should indicate which parsing routine detected the error, so you should execute one of the following five commands:

(error "illegal test")

(error "illegal expression")

(error "illegal term")

(error "illegal element")

(error "illegal factor")

Most errors will be detected by parse-factor since it is the lowest level construct in the grammar.  In grading, we won’t be concerned about which of your parsing methods detects the error.  All that we care about is that your code generates the error (versus, say, calling car on an empty list).

Below are some examples of calls that should produce errors:

(parse-term '(2.5 * 4 *))  ; no number after second *

(parse-factor '(foo bar)) ; the symbol foo is not a legal factor

(parse-factor '(- foo))  ; a minus must be followed by a number

(parse-term '(3 * foo))  ; no number after *

(parse-factor '(lparen rparen))  ; needs an expression between parens

(parse-factor '(lparen 2 + 3 4))  ; no right paren when we encounter 4

You are not allowed to use list mutation in solving this problem.

Keep in mind that your functions should be greedy, but not overly greedy.  For example:

(parse-factor '(- 13 - 17 - 9))

returns (-13 - 17 - 9)

Only the first combination of minus and number is turned into a negative number.  The method should process just one factor at the front of the list, not multiple factors.