CSE 401 15wi - Project I - Scanner

Due: Thursday, Jan. 22 at 11:00 pm. You should turn your project in using the assignment drop box (see link on the course home page) following the instructions at the bottom of this writeup.

Added 1/19: New information about class paths and other details needed to run the main compiler program from a command line. Also added more specific details about preparing the tar file to be handed in.

Added 1/20: Use Java 7 only, not Java 8. Your project will be tested using Java 7.

Overview

The purpose of this assignment is to construct a scanner for MiniJava. You should use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE 401 project page, and there is a starter project there that you should use to see how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does require specifying the tokens in the CUP input file. You will need to update those definitions to ones appropriate for the full MiniJava language so they can be used by your scanner. Both JFlex and CUP are included in the starter code.

To get started, one person in your group should download the starter project, unpack the files, then add and push them to your group's gitlab repository. The other person should then pull from the repository to get their copy of the files. See the CSE 401 git tutorial for basic information about working with gitlab for the course project.

You will need to examine the MiniJava source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols).

The starter code contains a TestScanner program that reads a file from standard input and prints a readable representation of the tokens in that file to standard output. You can run it with the command ant test-scanner, or the equivalent command from inside a programming environment like eclipse. This test program is intended to show how to use a JFlex scanner and you will want to study it to see how that works. But for the compiler itself you should create a more appropriate main program.

You should create a Java class named MiniJava with a main method that controls execution of your compiler. This method should examine its arguments (the String array parameter that is found in every Java main method) and work as follows. The idea is that when this method is executed using the command

 java MiniJava -S filename.java
the compiler should open the named input file and read tokens from it by calling the scanner repeatedly until the end of the input file is reached. The tokens should be printed on standard output (Java's System.out) using a format similar to the one produced by the TestScanner program in the starter code.

If the MiniJava main program is executed with the -S option but with no input filename, it should read from standard input (System.in) and print tokens to standard output as before. In case it's useful, we've provided a small demonstration program OptionalFile.java as an example of how a program can read from standard input or a named file depending on whether a filename is provided.

The source code for MiniJava.java should be in the top-level project src folder, and ant will compile it automatically along with all the other project files when needed.

The actual details of running MinjJava's main method from a command prompt are a bit more complicated, because the Java virtual machine needs to know where the compiled classes and libraries are located. The following commands should recompile any necessary files and run the scanner:

  ant
  java -cp build/classes:lib/java-cup-11b.jar MiniJava -S filename.java
If you set the CLASSPATH environment variable to point to the library jar file and compiled classes directory, you should not need to provide the -cp argument on the java command.

The build.xml file processed by ant already contains options to specify the class path, which is why you don't have to specify those things to run targets like test-scanner using ant. You can add similar targets to build.xml to run your MiniJava program or other test programs using ant, and you can use additional ant options in build.xml to specify program arguments like -S.

To test your scanner, you should use a variety of input files, including some that contain legal MiniJava programs and others that contain random input. Be sure your scanner does something reasonable if it encounters junk in the input file. (Crashing, halting immediately on the first error, or infinite looping is not reasonable; complaining, skipping the junk, and moving on is.) Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called.

This assignment only asks you to implement the scanner part of the project. The parser, abstract syntax trees, and CUP grammar rules will come in the next part.

Your code should only use language features available in Java 7, which is the environment that will be used to test your compiler project..

You should use your CSE 401 gitlab repository to store the code for this and remaining parts of the compiler project.

What to Hand In

The main information we will examine for this phase of the project is your JFlex and CUP specification files, your MiniJava class and main program, and your test input files. Include example source files that demonstrate the abilities of your scanner, including at least one with an error in the middle of the file. You should not hand in the intermediate file(s) produced by the JFlex scanner generator -- machine generated code is generally unenlightening, consisting of a bunch of tables and uncommented code, if it is readable at all.

Your code should run on the lab linux machines (or attu) when built with ant. You should do an ant clean, then bundle up your compiler directory in a tar file and turn that in. That will ensure that we have all the pieces of your compiler if we need to check something, and we will use the same procedure for later phases of the project. To create the tar file, run the following commands starting in your main project directory (the one that contains build.xml)

  ant clean
  cd ..
  tar cvfz scanner.tar.gz your_project_directory_name
Then turn in the scanner.tar.gz file.

You and your partner should turn in only a single copy of the project using one of your UW netids. You should include a file named INFO at the top level of your directory with your names and uw netids so we can correctly identify everyone in the group and get feedback to you.

To be sure that everything is in working order, we strongly suggest that before you create the tar file you first run ant clean; ant to rebuild your project from scratch, then run any tests you want, then run the commands given above to create the actual tar file to be turned in.