CSE 401/M501 22sp Project I - Scanner

Due: Thursday, Apr. 14 at 11:00 pm. You will "turn in" your project by pushing it to your GitLab repository and providing a suitable tag. See the end of this writeup for details.

Please note: The following writeup has some specific requirements for how your scanner (and compiler) should work: options and parameters that must be supported by your compiler program, required output, required program exit values (return codes) and other details. Please be sure that your compiler works as specified, does not produce extraneous output messages, runs on the lab machines using Java 11, and otherwise behaves predictably so that we can test it. You need to get these details right on this and on later phases of the project.

Overview

The purpose of this assignment is to construct a scanner for MiniJava. You should use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE 401/M501 project page. Starter files will be pushed to each project group's Gitlab repository and those files contain a sample project that shows how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does require specifying the tokens in the CUP input file. You will need to update those definitions to ones appropriate for the full MiniJava language so they can be used by your scanner, which may require removing some things provided as part of the starter code as well as adding things needed for MiniJava tokens. The JFlex and CUP programs and libraries are included in the starter code.

To begin, you (and your partner) should clone your group's GitLab repository containing the starter files. You can find a link to the GitLab web interface on the main CSE 401/M501 web page or the compiler project page. You can also find links to git documentation there if you need to refresh your knowledge of git.

You will need to examine the MiniJava source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols). Also, be sure to review the MiniJava project description and be sure you understand the scope of the language and project. Note that the MiniJava grammar treats several things as reserved words even though these are not reserved in full Java. Examples include the constants "true", "false", "main" and other literal strings that appear in double quotes in the MiniJava Grammar.

The starter code contains a DemoScanner program that reads a file from standard input and prints a readable representation of the tokens in that file to standard output. You can run it with the command ant demo-scanner, or the equivalent command from inside a programming environment like IntelliJ. This demo program is intended to show how to use a JFlex scanner and you will want to study it to see how that works. But for the compiler itself you should create a more appropriate main program and you will need to create an appropriate set of tokens for MiniJava.

You should create a Java class named MiniJava with a main method that controls execution of your compiler. This method should examine its arguments (the String array parameter that is found in every Java main method) to discover compiler options and the name of the file to be compiled. When this method is executed using the command

java MiniJava -S filename.java
the compiler should open the named input file and read tokens from it by calling the scanner repeatedly until the end of the input file is reached. The tokens should be printed on standard output (Java's System.out) using a format similar to the one produced by the DemoScanner program in the starter code.

When the compiler (i.e., just the scanner at this point) terminates, it must return an "exit" or status code indicating whether any errors were discovered when compiling the input program. In Java the method call System.exit(status) terminates the program with the given status. The status value should be 0 (normal termination) if no errors are discovered. If one or more errors are detected, the exit status value should be 1.

Note: The scanner and parser demo programs in the starter code read their input from stdin. Your compiler must read input from the file named on the java command, so you will need to include appropriate code in your MiniJava main program to open that file and prepare it for reading.

The source code for MiniJava.java should be in the top-level project src folder, and ant will compile it automatically along with all the other project files when needed.

The actual details of running MiniJava's main method from a command prompt are a bit more complicated, because the Java virtual machine needs to know where the compiled classes and libraries are located. The following commands should recompile any necessary files and run the scanner when they are executed in the top-level directory containing the build.xml ant file:

ant
java -cp build/classes:lib/java-cup-11b.jar MiniJava -S filename.java
If you set the CLASSPATH environment variable to point to the library jar file and compiled classes directory, you should not need to provide the -cp argument on the java command. If you are using a Windows terminal window instead of a mac or linux terminal, you will need to use ; instead of : as a path separator in the -cp option, and perhaps quote it:
java -cp "build/classes;lib/java-cup-11b.jar" MiniJava -S filename.java
More info on CLASSPATH can be found here.

The build.xml file processed by ant already contains options to specify the class path, which is why you don't have to specify those things to run targets like demo-scanner using ant. You can add similar targets to build.xml to run your MiniJava program or other test programs using ant, and you can use additional ant options in build.xml to specify program arguments like -S.

To test your scanner, you should create a variety of input files, including some that contain legal MiniJava programs and others that contain programs with lexical errors and random input. We have provided one possible suggestion for how to organize your test files using JUnit in the test subdirectory, but you are not required to follow it (or even to use JUnit). There is a single example test to get you started, but it is designed for the small demo language above so you will need to modify things to work with MiniJava and your choice of token names. To get an understanding for how the JUnit setup works and how you can expand on it, read the documentation at test/README.txt and look at the bottom of the build.xml file for an example test-related target (test). Feel free to arrange and run your tests however you'd like -- but keep in mind that it is nearly impossible to find all the edge cases in something as complex as a compiler without an organized and thorough approach to testing.

Be sure your scanner does something reasonable if it encounters junk in the input file. Crashing, halting immediately on the first error, or infinite looping are not reasonable; complaining, skipping the junk, and moving on to find the next token in the input file is. The starter jflex code already contains code to handle unrecognized input characters and report them as an error, so you may not have to do much additional to get this right.

Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called, regardless of whether the sequence of tokens extracted from the input actually forms a legal MiniJava program.

This assignment only asks you to implement the scanner part of the project. The parser, abstract syntax trees, and CUP grammar rules will come in the next part, and we strongly advise that you not try to "get ahead" by implementing anything further at this time, other than minimal changes needed to the starter code so the scanner will build and execute properly.

You should use your CSE 401/M501 GitLab repository to store the code for this and remaining parts of the compiler project.

What to "Hand In"

The main items we will examine for this phase of the project are your JFlex and CUP specification files, your MiniJava class and main program, and your test input files. Include example source files that demonstrate the abilities of your scanner, including at least one with an error in the middle of the file. You should not include the intermediate file(s) produced by JFlex or CUP -- machine generated code is generally unenlightening, consisting of a bunch of tables and uncommented code, if it is readable at all. For the same reason, these generated files produced by JFlex and CUP and compiler output like .class or .o files should not be pushed to the repository.

We will test your code on the lab Linux machines (attu or equivalent) and your project should build and work properly when run with ant and using Java 11. Since this early stage of the project only includes Java code, if it runs properly with ant and Java 11 in a Windows, Mac, or other environment, it should also be ok. If you are using a programming environment like IntelliJ, set the project language level option(s) to Java 11 so the IDE will properly diagnose unavailable language features, even if the installed Java version on your machine is something more recent.

Once you're done, "turning in" the assignment is simple: you designate the revision (commit) in your git repository that the course staff should examine for grading by tagging it scanner-final. But there are multiple ways to get this wrong, so you should carefully follow the steps below, particularly if you are new to git. If you have a lot of git experience, our apologies for perhaps belaboring the obvious, but we want to be sure that assignments get pushed and tagged properly and without leaving git repositories in strange states. If you are not using Linux or another Linux-based command-line environment, please do the moral equivalent of the following on your system.

The idea is:

  1. Tidy up and be sure that everything is properly committed and pushed to your GitLab repository.
  2. Add a tag to your repository to specify the commit that corresponds to the finished assignment.
  3. Check out a fresh copy of the repository and verify that everything has been done properly.

1. Tidy up and be sure everything is properly committed. Commit and push all of your changes to your repository (see the main project web page for links to git information if you need a refresher on how to do this). Then in your main project directory:

bash% git pull
bash% ant clean
bash$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
If you see any messages about uncommitted changes or any other indications that the latest version of your code has not been pushed to the GitLab repository, fix those problems and push any unsaved changes before going on. Then repeat the above steps to verify that all is well.

2. Tag your repository and push the tag information to GitLab to indicate that the current commit is the version of the scanner that you are submitting for grading:

bash% git tag scanner-final
bash% git push --tags 
Do not do this until after you have pushed all parts of your scanner project to GitLab.

3. Check that everything is properly stored and tagged in your repository. To be sure that you really have updated everything properly, create a brand new, empty directory that is nowhere near your regular working directory, clone the repository into the new location, and verify that everything works as expected. It is really, really, REALLY important that this not be nested anywhere inside your regular, working repository directory. Do this:

bash% cd <somewhere-completely-different>
bash% git clone git@gitlab.cs.washington.edu:cse401-22sp-students/cse401-22sp-xy.git
bash% cd cse401-22sp-xy
bash% git checkout scanner-final
Use your group's project code instead of xy, of course. The commands after git clone change to the newly cloned directory, then cause git to switch to the tagged commit you created in step 2, above. We will do the same when we examine your files for grading.

At this point you should see your project directory. Run ant to build the project and run any tests you wish (something that will be even more essential on future assignments). If there are any problems, erase this newly cloned copy of your repository (rm -rf cse401-22sp-xy) go back to your regular working repository copy, and fix whatever is wrong. DO NOT do any additional work in the copy of the repository that you have cloned to verify your work. After the git checkout command the repository is in a "detached head" state, and any changes you make to that copy of the repository will either be lost or pushed to a hidden branch or otherwise cause problems. DON'T do any further work using that copy.

The necessary changes in the original repo may be as simple as running a missed git push --tags command if the tag was not found in the repository. If it requires more substantive changes, you may need to do a little voodoo to get rid of the original scanner-final tag from your repository and re-tag after making your repairs. To eliminate the scanner-final tag, do this (this should not normally be necessary):

bash% git tag -d scanner-final
bash% git push origin :refs/tags/scanner-final 
Then make, commit, and push your repairs, and repeat the tag and tag push commands from step 2. And then repeat step 3 to be sure that the updated version is actually correct.

Once you are satisfied that the scanner-final tag in the repository correctly identifies the finished scanner project you are done.