Due: Wednesday, Oct. 21, at 11:00 pm. You should turn your project in using the assignment drop box (links on the projects page).
The purpose of this assignment is to construct a scanner for MiniJava. We suggest you use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE P 501 project page, and there is a starter project there that you can use to see how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. You will want to get CUP and install it now, since it is usually easiest to define lexical classes in the CUP specification and use the generated constant definitions in JFlex. CUP and JFlex are both included in the starter project.
You will need to examine the source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols).
You should test your scanner on a variety of files, including some that contain legal programs and others that contain random input. Be sure your scanner does something reasonable if it encounters junk in the input file. (Crashing or infinite looping is not reasonable; complaining, skipping the junk, and moving on is.) Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called.
You should hand in your source files (JFlex specification), a readme file describing what tools you are using (language, scanner generator, etc.), some sample source files, and the files containing the corresponding token stream that your scanner test program produces. Include example source files that demonstrate the abilities of your scanner, including at least one with an error in the middle of the file. You should not hand in the intermediate file(s) produced by by the scanner generator -- machine generated code is generally unenlightening, consisting of a bunch of tables and uncommented code, if it is readable at all.
You and your partner (if you have one) should turn in only a single copy of the project using one of your UW netids. Your readme file should include your names and uw netids so we can correctly identify everyone involved in the group and get feedback to you.