Due: Monday, April 21, at 11:00 pm. You should turn your project in using the assignment dropbox
The purpose of this assignment is to construct a scanner for MiniJava. We suggest you use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE P 501 project page, and there is a starter project there that you can use to see how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does use the token definitions contained there and, in fact, you will need to update those definitions so you can use the constants it generates in your scanner. Both JFlex and CUP are included in the starter project.
You will need to examine the MiniJava grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols).
To test your scanner, use the TestScanner program provided in the starter code (or something quite similar) to read tokens from input files and print them to standard output until an end-of-file input token is read. You should test your scanner on a variety of files, including some that contain legal programs and others that contain random input. Be sure your scanner generates intelligible error messages (including at least the line and the column) if it encounters junk in the input file. Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called.
This assignment only asks you to implement the scanner part of the project. The parser, abstract syntax trees, and CUP grammar rules will come in the next part.
build.sh/build.cmd
: When called, this should do whatever
compilation steps are needed. Most likely, this will just call ant
build
scan.sh/scan.cmd
: When called after
build.sh/build.cmd
has been called, this will take a MiniJava
program on stdin and output to stdout the stream of tokens it produces,
separated by whitespace. Note that you are free to create your own names for
all the tokens. Most likely, this will just run the provided
TestScanner
class.ant clean
before turning in your code to avoid
turning in all your build artifacts. Similarly, please avoid turning in your
entire .git folder, or the equivalent for whatever VCS you're using.