Due: Thursday, May 6 by 11 pm.
Please use Gradescope (gradescope.com
,
also linked from the CSE 401/M501 resources web page) to submit
your homework online. Gradescope's web site has several
instructional videos and help pages to outline the process,
and this guide has
specific information about scanning and uploading pdf files
containing assignments.
This assignment has two parts - some written problems involving LL grammars, and a second programming part. These will be submitted separately on Gradescope. The written (part 1) problems should be scanned and uploaded as usual. For the programming problems, drag and drop your files onto the Gradescope interface for part 2 to upload them. The two parts are separate "assignments" in Gradescope because the mechanics for submitting and processing them are different.
s
C n
g
| εr
| t
i
| t
;
Sid
:=
Eprint(
L )
id
num
+
E(
S ,
E )
,
EWhen you are done with the written problems for this part of the assignment, scan your work and submit it using Gradescope, as with previous written homework assignments.
Write a Java program to implement a small calculator. The goal of this exercise is to gain experience with top-down LL(1) parsing by writing a program whose input is specified by a small formal grammar, and writing a recursive-descent parser to process that input. The calculator language and its capabilities have been deliberately kept simple and limited to focus on top-down parsing, with only the minimal infrastructure needed to support that.
Calc
that contains the main
method for the calculator in this Calc.java
file
(right-click to download this file to use in your code).
It should be possible to compile and run the calculator using the following
terminal commands on any system that has standard Java 11 or later installed.
The user enters lines containing arithmetic expressions or commands
and the calculator immediately displays the value of the expression or
executes the command.
In the example below, user input is underlined;
comments beginning with a #
are for explanation in this example only and are not
part of the calculator input or output:
javac Calc.java # compile everything needed java Calc # start the program 1+1 2.0 x+1 x not defined let x = 17 x+1 18.0 2 *x+ 8 # i.e., 2*x+8; whitespace is ignored 42.0 let almostpi = 22/7 almostpi 3.142857142857143 y+x*4 y not defined let y = 2 let x = 3 y+x*4 14.0 (y+x)*4 20.0 list x = 3.0 almostpi = 3.142857142857143 y = 2.0 quit
The input to the calculator is specified by the following grammar and associated semantic rules. The input is a program, which consists of one or more statements, each of which occupies one line of input. Input lines are processed immediately as soon as they are read by the calculator.
program ::= statement | program statement
statement ::= exp |let
id=
exp |list
|quit
exp ::= term | exp+
term | exp-
term
term ::= factor | term*
factor | term/
factor
factor ::= id | integer |(
exp)
let
id =
exp is executed by evaluating
the expression and binding the resulting value to the given identifier. Any previous
value bound to that identifier is replaced. No output is printed.id not defined
is printed (with the actual identifier substituted for id)
and the value of the expression is not defined.
If an expression contains multiple occurrences of undefined identifiers,
the error message is printed for each occurrence of each undefined identifier.list
causes all known identifiers to be printed
along with the values they are currently bound to using the format
id =
exp, with each identifier binding printed on a separate
line. The order in which the
identifiers and their values
are printed is not specified.quit
terminates execution of the calculator.aa
, AA
, Aa
,
and aA
are
four different identifiers. An integer is a
decimal integer constant consisting of one or more decimal digits (0-9).
The implementation may restrict the magnitude of an integer constant so that it can be
stored as an ordinary Java int
value.let
, list
, and quit
)
are reserved and may not be used as identifiers.double
,
even though the grammar is restricted to only allow integers as numeric constants.
Note that the grammar implies
that all of the binary arithmetic operations are left-associative.The goal of this assignment is to gain experience with recursive-descent parsing.
To focus on that, the language has been deliberately kept very simple. For example, it
includes integer constants only, even though internal arithmetic should be done using
Java double
s. Also, identifiers are restricted to being sequences only
of letters with no digits, underscores, or other characters, and no error handling
is specified other than detecting use of undefined identifiers in expressions.
Your calculator must use recursive-descent parsing as outlined in lecture to
parse and evaluate the language defined above. You will find the examples
in the lecture slides to be a good starting point, and the textbook has additional
discussion and examples. The parser should contain a
method for each major non-terminal in the grammar. Each parser method should parse
the associated non-terminal, and either evaluate or
execute the language construct that it parses. After each expression statement
is parsed, the resulting value should be printed using the ordinary Java
System.out.println
method. You will need to maintain a few additional
data structures; in particular you'll want to have a hash table or dictionary
that maps known identifiers to their values.
If the grammar is not suitable for a recursive-descent LL(1) parser, modify it as needed so you can write a predictive, top-down parser. However, you are not required to modify the grammar formally to create an equivalent one that satisfies the LL(1) condition if parsing can be done correctly in a straight-forward manner (i.e., this is not a "fix the grammar" question - just make any adjustments needed so you can write a correct recursive-descent parser -- the examples from lecture should be helpful).
Your code only needs to work on syntactically correct input. If possible, you will find it more convenient to test and use the calculator if it can recover from an error in one statement and continue processing with the next one on the next input line, but this is not required.
We have provided you with the main
method in Calc.java
.
Feel free to change it, but it must be able to be compiled and executed by entering the
commands javac Calc.java
followed by java Calc
as shown in
the example transcript above. You should also create two files to contain the key classes
used by the main program: Scanner.java
and Parser.java
.
Do not add package
declarations to any of these files. All of the code should be
in the default, anonymous package used by Java when package
declarations
are not present.
As implied by the provided Calc.java
file, you should create two classes:
a simple scanner class, and the parser class. We suggest that your scanner have an
appropriate method that the parser can call to get the next input token. The Scanner
class should handle the details of reading input lines and delivering tokens to the parser
when requested, for example, with a getNextToken()
method. The parser itself should
have a run()
method that the main program can call to initiate parsing and processing
of input lines until a quit
statement is executed. This separation of input
character handling (scanning) from the actual parsing should make the structure of the
overall program simpler and easier to build. The Parser
class should contain
all of the parser methods and any data shared by them.
Keep the scanner simple and take advantage of any useful string handling classes
found in the Java standard library. One possibility is to read each input line as a
single string, then use a Java library split
method
to break the input line up into an array of strings, each of which corresponds to
a single token. You will still want some sort of getToken
or
nextToken
method in the scanner to deliver individual tokens when
requested by the parser, but you don't need to (i.e., should not) create an
elaborate DFA or
write lengthy code to do this. It's also up to you how you represent tokens.
Just remember that the parser will need some sort of ID(string) token
to represent an identifier, with the actual identifier contained in a field in
the token, and something similar needs to be done for integer constants.
Because individual input lines represent single input statements and there
is no punctuation like a semicolon to terminate statements, you may find it useful
to create some sort of EOL
(end-of-line) token that the scanner can return
to the parser at the end of each input line.
Although not required as part of the assignment, we suggest you write a tiny stand-alone program to test your scanner by repeatedly getting tokens from the scanner and printing them. Feel free to include more elaborate automated testing if you'd like, but it might be sufficient for this assignment just to be able to type a few input lines and look at the output produced to verify that the scanner seems to be working properly.
The parser will be simpler if there is some sort of shared "next token" variable accessible to all parser methods, and a method that can be called to advance (update) the "next token" variable by calling the appropriate scanner method as the parser does its work.
A simple way to evaluate expressions is to have each parser method that processes
an expression or sub-expression return a double
value giving the result
of evaluating the expression or sub-expression parsed by that method.
An easy way to indicate errors in an expression (if you need to do this)
would be to return a Double.NaN
value if an error occurs while
trying to parse and evaluate part of an expression.
Remember that the goal of this part of the assignment is to gain experience with recursive-descent parsing, not to create a Complex Industrial-Strength Production-Quality Software Artifact™. In particular, keep the scanner and token data structure as simple as possible. This problem is not the same as the compiler project, so those parts of the calculator program should be as simple as possible as long as they get the job done.
The calculator input specification
is designed so that a scanner can read one input line from the terminal at a time
as a string, then use library functions like split()
to chop it up
into an array of strings with individual operators, identifiers, and numbers
in the array elements, and then deliver those one at a time to the parser as
"tokens", skipping any that only contain whitespace.
You do not have to do things this way, but some similar approach should do the job.
No fancy DFA needed.
You will need some sort of simple "token" data structure to be able to,
for instance, return an integer as an integer token with an associated value, but
for the most part strings from the input can be treated as tokens themselves if you
want - operator tokens like "+" or keywords like "list".
There's no need to create a long enumerated
type with lexical classes for a fancy Token data structure if you don't need it
(and you probably don't).
Similarly, just put the code in the unnamed, default Java package - no need for
a complex package or folder structure.
Bottom line is keep it simple. If you find yourself writing big, complex scanner or token classes, step back and take another look. Keep those as simple as possible as long as they modularize the program and allow the recursive-descent parser to request the input one token at a time and do its job.
When you are done with this part of the assignment, turn in your source program
file(s) using Gradescope. Be sure to include your Calc.java
file with
any changes or additions you make there. File submission
for this part of the assignment is separate from the written problems in Part 1.