handout #15

CSE341—Programming Languages

Programming Assignment #7 (Scheme)

due 11 pm Thursday, 5/20/10

In this assignment you will write a set of functions for parsing various token sequences that appear in programs written in BASIC.  We will be using an approach known as recursive descent parsing in which we define a series of functions based on the underlying grammar with one function for each production from the grammar.  The grammar we will be using can be described using the following EBNF productions:

<test> ::= <expression> ("<" | ">" | "<=" | ">=" | "=" | "<>") <expression>

<expression> ::= <term> {("+" | "-") <term>}

<term> ::= <element> {("*" | "/") <element>}

<element> ::= <factor> {"^" <factor>}

<factor> ::= <number> | ("+" | "-") <factor> | "(" <expression> ")" |

             <f> "(" <expression> ")"

<f> ::= SIN | COS | TAN | ATN | EXP | ABS | LOG | SQR | RND | INT

In EBNF:

·         The pipe character (“|”) indicates “or.”  For example, the final rule says that an “<f>” is either SIN or COS or TAN, and so on.

·         Curly braces indicate “0 or more of.”  For example, the production for <element> indicates that it is composed of a <factor> followed by 0 or more “^ <factor>.”  So it would match sequences like this:

<factor> ^ <factor> ^ <factor> ^ <factor> ^ <factor>

·         Parentheses are used to group subparts of a rule.  For example, the <expression> rule uses parentheses to indicate that terms are separated by either a plus or a minus, as in:

<term> + <term> + <term> - <term> + <term> - <term> - <term>

You should define a function to parse each of the five major elements described above.  The sixth rule is slightly different (described below).  Your functions will take a list of tokens as an argument.  It should consume tokens from the list and replace those tokens with the value obtained by evaluating the tokens.  For example, parse-term evaluates the multiplicative operators * and /.  If it is passed the list (2.5 * 3.4 / 2 THEN 210), it will evaluate 2.5 * 3.4 / 2, obtain 4.25 and replace the tokens it parsed with 4.25, returning the list (4.25 THEN 210).  These functions will be “greedy” in the sense that they will consume as many tokens from the front of the list as they can that can be part of the given grammar element.  The token “THEN” can’t be part of a term, so the function stops consuming tokens when it encounters it.  Even though it is greedy, it should pay attention only to tokens that appear at the front of the list.

Parentheses are difficult to handle as symbols, so the lists you will be processing use the tokens lparen and rparen for left parenthesis and right parenthesis.

The sixth rule of the grammar indicates a list of legal function names.  Because it doesn’t mention any other grammar elements, it doesn’t need its own parsing function.  You are welcome to define a helper function that processes it, but it’s not a requirement.  In evaluating expressions involving these functions, you will need to translate the named function into a Scheme function.  The skeleton file for the assignment includes the following association list that will help you to accomplish this task:

 (define functions

  '((SIN . sin) (COS . cos) (TAN . tan) (ATN . atan) (EXP . exp) (ABS . abs)

    (LOG . log) (SQR . sqrt) (RND . rand) (INT . trunc)))

Each element of the association list is a dotted pair where the car is a key and the cdr is a value (like a map in Java).  Scheme has a standard function called assoc that will search for a particular key.  For example, given the list above, (assoc ‘ATN functions) returns (ATN . atan).  To get the “atan” symbol from this, you’d take the cdr of it.  Because it’s a dotted pair and not a list, you use cdr instead of cadr.  If the key is not found, as in the call (assoc ‘FOO functions), you get #f as a result.

There are several cases where Scheme returns rational numbers when we would prefer to obtain real numbers.  For example, division can produce such results as can exponentiation with integers where the exponent is negative:

> (/ 3 4)

3/4

> (expt 3 -2)

1/9

For these cases, you should call the function exact->inexact to convert the rational to a real:

> (exact->inexact (/ 3 4))

0.75

> (exact->inexact (expt 3 -2))

0.1111111111111111

You are to write the following parsing functions:

1.      (30 points) Write a function parse-factor that parses a factor at the front of a list, replacing the tokens that were part of the factor with the numeric value of the factor.  As indicated in the grammar above, a factor is either a simple number or a sign followed by a factor (e.g., the negation of a number) or a parenthesized expression or a parenthesized call on a function.  For example:

> (parse-factor '(3.5 2.9 OTHER STUFF))

(3.5 2.9 OTHER STUFF)

> (parse-factor '(- 7.9 3.4 * 7.2))

(-7.9 3.4 * 7.2)

> (parse-factor '(lparen 7.3 - 3.4 rparen + 3.4))

(3.9 + 3.4)

> (parse-factor '(SQR lparen 12 + 3 * 6 - 5 rparen))

(5)

> (parse-factor '(- lparen 2 + 2 rparen * 4.5))

(-4 * 4.5)

Keep in mind that your functions should be greedy, but not overly greedy.  For example:

> (parse-factor '(- 13 - 17 - 9))

(-13 - 17 - 9)

Only the first combination of minus and number is turned into a negative number.  The method should process just one factor at the front of the list, not multiple factors.

2.      (15 points) Write a function parse-element that parses an element at the front of a list, replacing the tokens that were part of the element with the numeric value of the element.  As indicated in the grammar above, an element is a series of factors separated by the token ^ which indicates exponentiation.  Your function should consume as many tokens as possible that could be part of an element.  The ^ operators should be evaluated left-to-right.  You can call the standard expt function to compute the exponent.  As noted above, the result can be a negative number when the exponent is negative, so you need to call exact->inexact in that case to convert from a rational to a real.  Below are several examples of parse-element’s intended behavior:

> (parse-element '(2 ^ 2 ^ 3 THEN 450))

(64 THEN 450)

> (parse-element '(2 ^ 2 ^ -3 THEN 450))

(0.015625 THEN 450

> (parse-element '(2.3 ^ 4.5 * 7.3))

(42.43998894277659 * 7.3)

> (parse-element '(7.4 + 2.3))

(7.4 + 2.3)

> (parse-term '(2 ^ 3 ^ 4 ^ 5 2 ^ 3 4 ^ 5))

(1152921504606846976 2 ^ 3 4 ^ 5)

3.      (15 points) Write a function parse-term that parses a term at the front of a list, replacing the tokens that were part of the term with the numeric value of the term.  As indicated in the grammar above, a term is a series of elements separated by the tokens * and / (multiplication and division).  Your function should consume as many tokens as possible that could be part of a term.  The * and / operators should be evaluated left-to-right.  Scheme will potentially be returning a rational when we use division, which we don’t want.  So for each division, return the value obtained by passing the result to the standard Scheme function exact->inexact, as in (exact->inexact 7/8), which returns 0.875.  For example:

> (parse-term '(2.5 * 4 + 9.8))

(10.0 + 9.8)

> (parse-term '(38.7 / 2 / 3 THEN 210))

(6.45 THEN 210)

> (parse-term '(7.4 * lparen 2.4 - 3.8 rparen / 4 - 8.7))

(-2.59 - 8.7)

> (parse-term '(3 / 4 + 9.7))

(0.75 + 9.7)

> (parse-term '(2 * 3 * 4 + 3 * 8))

(24 + 3 * 8)

> (parse-term '(24 / 2 - 13.4))

(12.0 - 13.4)

4.      (15 points) Write a function parse-expression that parses an expression at the front of a list, replacing the tokens that were part of the expression with the numeric value of the expression.  As indicated in the grammar above, an expression is a series of terms separated by the tokens + and – (addition and subtraction).  Your function should consume as many tokens as possible that could be part of an expression.  The + and - operators should be evaluated left-to-right.  For example:

> (parse-expression '(12.4 - 7.8 * 3.5 THEN 40))

(-14.9 THEN 40)

> (parse-expression '(2 + 3.4 - 7.9 <= 7.4))

(-2.5 <= 7.4)

> (parse-expression '(3 * 4 ^ 2 / 5 + SIN lparen 2 rparen))

(10.50929742682568)

> (parse-expression '(15 - 3 - 2 foo 2 + 2))

(10 foo 2 + 2)

5.      (25 points) Write a function parse-test that parses a test at the front of the list, replacing the tokens that were part of the test with the boolean value of the test (#t or #f).  As indicated in the grammar above, a test is an expression followed by a relational operator followed by an expression.  For example:

> (parse-test '(3.4 < 7.8 THEN 19))

(#t THEN 19)

> (parse-test '(2.3 - 4.7 ^ 2.4 <> SQR lparen 8 - 4.2 ^ 2 * 9 rparen FOO))

(#t FOO)

> (parse-test '(2 ^ 4 = 4 ^ 2))

(#t)

In writing your parse routines, you should test for potential errors and should call the error procedure if an error is encountered.  The call should indicate which parsing routine detected the error, so you should execute one of the following five commands:

(error "illegal test")

(error "illegal expression")

(error "illegal term")

(error "illegal element")

(error "illegal factor")

Most errors will be detected by parse-factor since it is the lowest level construct in the grammar.  In grading, we won’t be concerned about which of your parsing methods detects the error.  All that we care about is that your code generates the error (versus, say, calling car on an empty list).

Below are some examples of calls that should produce errors:

(parse-term '(2.5 * 4 *))  ; no number after second *

(parse-factor '(foo bar)) ; the symbol foo is not a legal factor

(parse-factor '(- foo))  ; a minus must be followed by a factor

(parse-term '(3 * foo))  ; no number after *

(parse-factor '(lparen rparen))  ; needs an expression between parens

(parse-factor '(lparen 2 + 3 4))  ; no right paren when we encounter 4

You are not allowed to use list mutation or vectors in solving this problem.

Below is a suggested development strategy:

1.      Write a simple version of parse-factor.  Have it handle a number.  This is a pretty trivial case, because it just puts the number back at the front of the list.  But then have it handle the rule where a factor can be a sign followed by a factor.  Whenever you see a nonterminal on the right side of a grammar rule, you know that it should be handled by a call on the corresponding parsing function.  So this is a case where parse-factor is going to call parse-factor.  If you code this correctly, you should be able to handle cases like this:

> (parse-factor '(2 + 2))

(2 + 2)

> (parse-factor '(- 2 + 2))

(-2 + 2)

> (parse-factor '(- - 2 + 2))

(2 + 2)

> (parse-factor '(- - - - - 2 2))

(-2 2)

> (parse-factor '(+ - - + + - + - + 2 2))

(2 2)

2.      Write parse-element.  It has to handle the case where there is just a factor and the case where that is followed by zero or more of an up-arrow followed by factor, as in

> (parse-element '(2 + 2))

(2 + 2)

> (parse-element '(- - - 2 + 2))

(-2 + 2)

> (parse-element '(2 ^ 3 - 4))

(8 - 4)

> (parse-element '(- - 2 ^ - - - 3 * 17))

(0.125 * 17)

> (parse-element '(2 ^ 3 ^ 2 / 14))

(64 / 14)

> (parse-element '(- - - 7 ^ - - 4 ^ - - - 2 < 74.5))

(1.7346652555743034e-007 < 74.5)

Remember that the grouping is left-to-right.

3.      Write parse-term and parse-expression.  You could initially have parse-term handle just multiplication and parse-expression handle just addition.  That makes them look a lot like parse-element.

4.      Go back to parse-factor and have it deal with a parenthesized expression.  Then have it deal with a function call.

5.      Write parse-test.

6.      Finish whatever you left unfinished (e.g., if you had parse-expression do just addition, then have it handle subtraction as well), including testing for errors.