handout #15
CSE341—Programming Languages
Programming
Assignment #7 (Scheme)
due 11 pm Thursday, 5/20/10
In this
assignment you will write a set of functions for parsing various token
sequences that appear in programs written in BASIC. We will be using an approach known as
recursive descent parsing in which we define a series of functions based on the
underlying grammar with one function for each production from the grammar. The grammar we will be using can be described
using the following EBNF productions:
<test> ::= <expression>
("<" | ">" | "<=" | ">=" |
"=" | "<>") <expression>
<expression> ::= <term>
{("+" | "-") <term>}
<term> ::= <element>
{("*" | "/") <element>}
<element> ::= <factor>
{"^" <factor>}
<factor> ::= <number> |
("+" | "-") <factor> | "("
<expression> ")" |
<f> "("
<expression> ")"
<f> ::= SIN | COS | TAN | ATN |
EXP | ABS | LOG | SQR | RND | INT
In EBNF:
·
The pipe character (“|”)
indicates “or.” For example, the final
rule says that an “<f>” is either SIN or COS or TAN, and so on.
·
Curly braces indicate “0 or more
of.” For example, the production for
<element> indicates that it is composed of a <factor> followed by 0
or more “^ <factor>.” So it would
match sequences like this:
<factor> ^ <factor> ^ <factor> ^ <factor> ^ <factor>
·
Parentheses are used to group
subparts of a rule. For example, the
<expression> rule uses parentheses to indicate that terms are separated
by either a plus or a minus, as in:
<term> + <term> + <term> - <term> + <term> - <term> - <term>
You should
define a function to parse each of the five major elements described
above. The sixth rule is slightly
different (described below). Your functions
will take a list of tokens as an argument.
It should consume tokens from the list and replace those tokens with the
value obtained by evaluating the tokens.
For example, parse-term evaluates the multiplicative operators * and
/. If it is passed the list (2.5 * 3.4 /
2 THEN 210), it will evaluate 2.5 * 3.4 / 2, obtain 4.25 and replace the tokens
it parsed with 4.25, returning the list (4.25 THEN 210). These functions will be “greedy” in the sense
that they will consume as many tokens from the front of the list as they can
that can be part of the given grammar element.
The token “THEN” can’t be part of a term, so the function stops
consuming tokens when it encounters it.
Even though it is greedy, it should pay attention only to tokens that
appear at the front of the list.
Parentheses
are difficult to handle as symbols, so the lists you will be processing use the
tokens lparen and rparen
for left parenthesis and right parenthesis.
The sixth rule
of the grammar indicates a list of legal function names. Because it doesn’t mention any other grammar
elements, it doesn’t need its own parsing function. You are welcome to define a helper function
that processes it, but it’s not a requirement.
In evaluating expressions involving these functions, you will need to
translate the named function into a Scheme function. The skeleton file for the assignment includes
the following association list that will help you to accomplish this task:
(define functions
'((SIN . sin) (COS . cos) (TAN . tan) (ATN . atan) (EXP . exp) (ABS . abs)
(LOG . log) (SQR . sqrt) (RND . rand) (INT . trunc)))
Each element
of the association list is a dotted pair where the car is a key and the cdr is a value (like a map in Java). Scheme has a standard function called assoc
that will search for a particular key.
For example, given the list above, (assoc ‘ATN functions) returns (ATN . atan). To get the “atan”
symbol from this, you’d take the cdr
of it. Because it’s a dotted pair and
not a list, you use cdr instead of cadr. If the key is
not found, as in the call (assoc ‘FOO functions), you get #f as a result.
There are
several cases where Scheme returns rational numbers when we would prefer to
obtain real numbers. For example,
division can produce such results as can exponentiation with integers where the
exponent is negative:
> (/ 3 4)
3/4
> (expt 3 -2)
1/9
For these
cases, you should call the function exact->inexact to convert the rational
to a real:
> (exact->inexact (/ 3 4))
0.75
> (exact->inexact (expt 3 -2))
0.1111111111111111
You are to
write the following parsing functions:
1.
(30 points) Write a function
parse-factor that parses a factor at the front of a list, replacing the tokens
that were part of the factor with the numeric value of the factor. As indicated in the grammar above, a factor
is either a simple number or a sign followed by a factor (e.g., the negation of
a number) or a parenthesized expression or a parenthesized call on a
function. For example:
> (parse-factor '(3.5 2.9 OTHER STUFF))
(3.5 2.9 OTHER STUFF)
> (parse-factor '(- 7.9 3.4 * 7.2))
(-7.9 3.4 * 7.2)
> (parse-factor '(lparen 7.3 - 3.4 rparen + 3.4))
(3.9 + 3.4)
> (parse-factor '(SQR lparen 12 + 3 * 6 - 5 rparen))
(5)
> (parse-factor '(- lparen 2 + 2 rparen * 4.5))
(-4 * 4.5)
Keep in mind
that your functions should be greedy, but not overly greedy. For example:
> (parse-factor '(- 13 - 17 - 9))
(-13 - 17 - 9)
Only the first
combination of minus and number is turned into a negative number. The method should process just one factor at
the front of the list, not multiple factors.
2.
(15 points) Write a function
parse-element that parses an element at the front of a list, replacing the
tokens that were part of the element with the numeric value of the
element. As indicated in the grammar
above, an element is a series of factors separated by the token ^ which
indicates exponentiation. Your function
should consume as many tokens as possible that could be part of an
element. The ^ operators should be
evaluated left-to-right. You can call
the standard expt function to compute the
exponent. As noted above, the result can
be a negative number when the exponent is negative, so you need to call
exact->inexact in that case to convert from a rational to a real. Below are several examples of parse-element’s
intended behavior:
> (parse-element '(2 ^ 2 ^ 3 THEN 450))
(64 THEN 450)
> (parse-element '(2 ^ 2 ^ -3 THEN 450))
(0.015625 THEN 450
> (parse-element '(2.3 ^ 4.5 * 7.3))
(42.43998894277659 * 7.3)
> (parse-element '(7.4 + 2.3))
(7.4 + 2.3)
> (parse-term '(2 ^ 3 ^ 4 ^ 5 2 ^ 3 4 ^ 5))
(1152921504606846976 2 ^ 3 4 ^ 5)
3.
(15 points) Write a function
parse-term that parses a term at the front of a list, replacing the tokens that
were part of the term with the numeric value of the term. As indicated in the grammar above, a term is
a series of elements separated by the tokens * and / (multiplication and division). Your function should consume as many tokens
as possible that could be part of a term.
The * and / operators should be evaluated left-to-right. Scheme will potentially be returning a
rational when we use division, which we don’t want. So for each division, return the value
obtained by passing the result to the standard Scheme function
exact->inexact, as in (exact->inexact 7/8), which returns 0.875. For example:
> (parse-term '(2.5 * 4 + 9.8))
(10.0 + 9.8)
> (parse-term '(38.7 / 2 / 3 THEN 210))
(6.45 THEN 210)
> (parse-term '(7.4 * lparen 2.4 - 3.8 rparen / 4 - 8.7))
(-2.59 - 8.7)
> (parse-term '(3 / 4 + 9.7))
(0.75 + 9.7)
> (parse-term '(2 * 3 * 4 + 3 * 8))
(24 + 3 * 8)
> (parse-term '(24 / 2 - 13.4))
(12.0 - 13.4)
4.
(15 points) Write a function
parse-expression that parses an expression at the front of a list, replacing
the tokens that were part of the expression with the numeric value of the
expression. As indicated in the grammar
above, an expression is a series of terms separated by the tokens + and –
(addition and subtraction). Your function
should consume as many tokens as possible that could be part of an
expression. The + and - operators should
be evaluated left-to-right. For example:
> (parse-expression '(12.4 - 7.8 * 3.5 THEN 40))
(-14.9 THEN 40)
> (parse-expression '(2 + 3.4 - 7.9 <= 7.4))
(-2.5 <= 7.4)
> (parse-expression '(3 * 4 ^ 2 / 5 + SIN lparen 2 rparen))
(10.50929742682568)
> (parse-expression '(15 - 3 - 2 foo 2 + 2))
(10 foo 2 + 2)
5.
(25 points) Write a function
parse-test that parses a test at the front of the list, replacing the tokens
that were part of the test with the boolean
value of the test (#t or #f). As
indicated in the grammar above, a test is an expression followed by a
relational operator followed by an expression.
For example:
> (parse-test '(3.4 < 7.8 THEN 19))
(#t THEN 19)
> (parse-test '(2.3 - 4.7 ^ 2.4 <> SQR lparen 8 - 4.2 ^ 2 * 9 rparen FOO))
(#t FOO)
> (parse-test '(2 ^ 4 = 4 ^ 2))
(#t)
In writing
your parse routines, you should test for potential errors and should call the
error procedure if an error is encountered.
The call should indicate which parsing routine detected the error, so
you should execute one of the following five commands:
(error "illegal test")
(error "illegal expression")
(error "illegal term")
(error "illegal element")
(error "illegal factor")
Most errors
will be detected by parse-factor since it is the lowest level construct in the
grammar. In grading, we won’t be
concerned about which of your parsing methods detects the error. All that we care about is that your code
generates the error (versus, say, calling car on an empty list).
Below are some
examples of calls that should produce errors:
(parse-term '(2.5 * 4 *)) ; no number after second *
(parse-factor '(foo bar)) ; the symbol foo is not a legal
factor
(parse-factor '(- foo)) ; a minus must be followed by a factor
(parse-term '(3 * foo)) ; no number after *
(parse-factor '(lparen rparen))
; needs an expression between parens
(parse-factor '(lparen
2 + 3 4)) ; no right paren
when we encounter 4
You are not
allowed to use list mutation or vectors in solving this problem.
Below is a
suggested development strategy:
1.
Write a simple version of
parse-factor. Have it handle a
number. This is a pretty trivial case,
because it just puts the number back at the front of the list. But then have it handle the rule where a
factor can be a sign followed by a factor.
Whenever you see a nonterminal on the right
side of a grammar rule, you know that it should be handled by a call on the
corresponding parsing function. So this
is a case where parse-factor is going to call parse-factor. If you code this correctly, you should be
able to handle cases like this:
> (parse-factor '(2 + 2))
(2 + 2)
> (parse-factor '(- 2 + 2))
(-2 + 2)
> (parse-factor '(- - 2 + 2))
(2 + 2)
> (parse-factor '(- - - - - 2 2))
(-2 2)
> (parse-factor '(+ - - + + - + - + 2 2))
(2 2)
2.
Write parse-element. It has to handle the case where there is just
a factor and the case where that is followed by zero or more of an up-arrow
followed by factor, as in
> (parse-element '(2 + 2))
(2 + 2)
> (parse-element '(- - - 2 + 2))
(-2 + 2)
> (parse-element '(2 ^ 3 - 4))
(8 - 4)
> (parse-element '(- - 2 ^ - - - 3 * 17))
(0.125 * 17)
> (parse-element '(2 ^ 3 ^ 2 / 14))
(64 / 14)
> (parse-element '(- - - 7 ^ - - 4 ^ - - - 2 < 74.5))
(1.7346652555743034e-007 < 74.5)
Remember that
the grouping is left-to-right.
3.
Write parse-term and
parse-expression. You could initially
have parse-term handle just multiplication and parse-expression handle just
addition. That makes them look a lot
like parse-element.
4.
Go back to parse-factor and have
it deal with a parenthesized expression.
Then have it deal with a function call.
5.
Write parse-test.
6.
Finish whatever you left
unfinished (e.g., if you had parse-expression do just addition, then have it
handle subtraction as well), including testing for errors.