CSE 401 Wi10 - Project I - Scanner

Due: Friday, January 22, at 5:00 pm. You should turn your project in using the assignment drop box (link on the course home page).

Overview

The purpose of this assignment is to construct a scanner for MiniJava. You should use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE 401 project page, and there is a starter project there that we suggest you use to see how these tools work together and get going. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does use the token definitions contained there and, in fact, you will need to update those definitions so you can use the constants it generages in your scanner. Both JFlex and CUP are included in the starter project.

You will need to examine the source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols).

You should test your scanner on a variety of files, including some that contain legal programs and others that contain random input. Be sure your scanner does something reasonable if it encounters junk in the input file. (Crashing, halting immediately on the first error, or infinite looping is not reasonable; complaining, skipping the junk, and moving on is.) Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called.

What to Hand In

The main information we need for this phase of the project is your JFlex specification, test programs, and files containing the output produced by your scanner for those tests. Include example source files that demonstrate the abilities of your scanner, including at least one with an error in the middle of the file. You should not hand in the intermediate file(s) produced by by the scanner generator -- machine generated code is generally unenlightening, consisting of a bunch of tables and uncommented code, if it is readable at all.

Your code should run on attu when built with ant. You should do an "ant clean", then bundle up your compiler directory in a tar file and turn that in. That will ensure that we have all the pieces of your compiler if we need to check something, and we will use the same procedure for later phases of the project.

You and your partner should turn in only a single copy of the project using one of your UW netids. Your readme file should include your names and uw netids so we can correctly identify everyone involved in the group and get feedback to you.