CSE 401 Au11 - Project I - Scanner

Due: Thursday, Oct. 13, at 11:00 pm. You should turn your project in using the assignment drop box (see link on the course home page).


The purpose of this assignment is to construct a scanner for MiniJava. You should use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE 401 project page, and there is a starter project there that you should use to see how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does use the token definitions contained there and, in fact, you will need to update those definitions so you can use the constants it generates in your scanner. Both JFlex and CUP are included in the starter project.

You will need to examine the MiniJava source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols).

To test your scanner, use the TestScanner program provided in the starter code (or something quite similar) to read tokens from input files and print them to standard output until an end-of-file input token is read. You should test your scanner on a variety of files, including some that contain legal programs and others that contain random input. Be sure your scanner does something reasonable if it encounters junk in the input file. (Crashing, halting immediately on the first error, or infinite looping is not reasonable; complaining, skipping the junk, and moving on is.) Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called.

This assignment only asks you to implement the scanner part of the project. The parser, abstract syntax trees, and CUP grammar rules will come in the next part.

What to Hand In

The main information we will examine for this phase of the project is your JFlex and CUP specification files, your test programs, and files containing the output produced by your scanner for those tests. Include example source files that demonstrate the abilities of your scanner, including at least one with an error in the middle of the file. You should not hand in the intermediate file(s) produced by the scanner generator -- machine generated code is generally unenlightening, consisting of a bunch of tables and uncommented code, if it is readable at all.

The test programs you use to demonstrate your scanner must include the Factorial.java sample program from the MiniJava web site, which is also included in the SampleMiniJavaPrograms directory in the starter code.

Your code should run on the lab linux machines (or attu) when built with ant. You should do an "ant clean", then bundle up your compiler directory in a tar file and turn that in. That will ensure that we have all the pieces of your compiler if we need to check something, and we will use the same procedure for later phases of the project.

You and your partner should turn in only a single copy of the project using one of your UW netids. You should include a file named INFO at the top level of your directory with your names and uw netids so we can correctly identify everyone involved in the group and get feedback to you.