CSE P 501 16wi - Project I - Scanner
Due: Monday, Jan. 25 at 11:00 pm. You will "turn in" your project by pushing it to your GitLab repository and providing a suitable tag. See the end of this writeup for details.
Overview
The purpose of this assignment is to construct a scanner for MiniJava. You should use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE P 501 project page, and there is a starter project there that you should use to see how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does require specifying the tokens in the CUP input file. You will need to update those definitions to ones appropriate for the full MiniJava language so they can be used by your scanner. Both JFlex and CUP are included in the starter code.
If your group is using a different implementation language you will need to use the starter project as a guide for how to set up the infrastructure for the project in your chosen language. The rest of this description is written in terms of Java/JFlex/CUP. Please make the appropriate adjustments to adapt to your language.
To get started, one person in your group should download the starter project, unpack the files, then add and push them to your group's GitLab repository. The other person should then pull from the repository to get their copy of the files. See the CSE P 501 git tutorial for basic information about working with gitlab for the course project. You can find a link to the GitLab web interface on the main CSE P 501 web page.
You will need to examine the MiniJava source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols).
The starter code contains a TestScanner
program that reads a file from
standard input and prints a readable representation of the tokens in
that file to standard output. You can run it with the command ant test-scanner
, or the equivalent command from inside a programming environment like eclipse. This test program is intended to show
how to use a JFlex scanner and you will want to study it to see how
that works. But for the compiler itself you should create a more
appropriate main program
and you will need to create an appropriate set of tokens for MiniJava.
You should create a Java class named MiniJava
with a
main
method that controls execution of your compiler.
This method should examine its arguments (the String
array parameter that is found in every Java
main
method) to discover compiler options and the name of the file to be
compiled. When this method is executed using the
command
java MiniJava -S filename.javathe compiler should open the named input file and read tokens from it by calling the scanner repeatedly until the end of the input file is reached. The tokens should be printed on standard output (Java's
System.out
) using a format similar to the one
produced by the TestScanner
program in the starter code.
The source code for MiniJava.java
should be in the top-level project src
folder, and ant
will compile it automatically along with all the other project files when needed.
The actual details of running MinjJava
's main
method from a command prompt are a bit more complicated, because the Java virtual machine needs to know where the compiled classes and libraries are located. The following commands should recompile any necessary files and run the scanner:
ant java -cp build/classes:lib/java-cup-11b.jar MiniJava -S filename.javaIf you set the
CLASSPATH
environment variable to point to the library jar
file and compiled classes directory,
you should not need to provide the -cp
argument on the java
command.
The build.xml
file processed by ant
already contains options to specify the
class path, which is why you don't have to specify those things to run targets like test-scanner
using ant
.
You can add similar targets to build.xml
to run your MiniJava
program or other test programs using ant
,
and you can use additional ant
options in build.xml
to specify program arguments like -S
.
When the compiler (i.e., just the scanner at this point) terminates it should
return an "exit" or status code indicating whether any errors were discovered
when compiling the input program.
In Java the method call System.exit(
status)
terminates the program with the given status.
The status value should be 0 (normal termination) if no errors are discovered.
If an error is detected, the exit status value
should be 1.
To test your scanner, you should use a variety of input files, including some that contain legal MiniJava programs and others that contain random input. Be sure your scanner does something reasonable if it encounters junk in the input file. (Crashing, halting immediately on the first error, or infinite looping is not reasonable; complaining, skipping the junk, and moving on is.) Remember, it is up to the parser to decide if the tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called.
This assignment only asks you to implement the scanner part of the project. The parser, abstract syntax trees, and CUP grammar rules will come in the next part.
You should use your CSE P 501 GitLab repository to store the code for this and remaining parts of the compiler project.
If you are using a different implementation language or additional libraries...
If you are implementing the project in a language other than Java 8, or if your code requires libraries that are not part of the standard Java 8 JDK, please be sure that your compiler works as similarly as possible to the specification above - options and input file name as command line arguments, proper return code from the compiler main program, etc. In addition, please be sure to include a README file at the top of your repository that explains:
- How to compile and build your project from source code, including descriptions of any necessary tools and libraries,
- How to run your compiler and supply command-line arguments or the equivalent, and, if possible,
- Include information about how your compiler can be run from a script (bash, ant, etc.) to make it easier to run your compiler against multiple source programs for testing. An example would be most helpful.
What to Hand In
The main information we will examine for this phase of the project
is your JFlex and CUP specification files, your MiniJava
class and main program,
and your test input files.
Include example source files that demonstrate the abilities
of your scanner, including at least one with an error in the middle
of the file. You should not include the intermediate file(s)
produced by JFlex or CUP -- machine generated code is
generally unenlightening, consisting of a bunch of tables and
uncommented code, if it is readable at all.
For the same reason, compiler output like .class
or .o
files should generally not be pushed
to the repository.
Once you're done, "turning in" the assignment is simple -- create an appropriate tag in your git repository to designate the revision (commit) that the course staff should examine for grading. But there are multiple ways to get this wrong, so you should carefully follow the following steps, particularly if you are new to git. (If you have a lot of git experience, our apologies for perhaps belaboring the obvious, but we want to be sure that assignments get pushed and tagged properly and without leaving git repositories in strange states. If you are not using a Linux system, please do the moral equivalent of the following on your system.)
The idea is:
- Tidy up and be sure that everything is properly committed and pushed to your GitLab repository.
- Add a tag to your repository to specify the commit that corresponds to the finished assignment.
- Check out a fresh copy of the repository and verify that everything has been done properly.
1. Tidy up and be sure everything is properly committed. Commit
and push all of your changes to your repository (see the main project web page for links to git
information if you need a refresher on how to do this).
Then in your main project directory:
If you see any messages about uncommitted changes or any other indications that the latest version of your code has not been pushed to the GitLab repository, fix those problems and push any unsaved changes before going on. Then repeat the above steps to verify that all is well.bash% git pull bash% ant clean bash$ git status On branch master Your branch is up-to-date with 'origin/master'. nothing to commit, working directory clean
2. Tag your repository and push the tag information to GitLab to indicate that the current commit is the version of the scanner that you are submitting for grading:
Do not do this until after you have pushed all parts of your scanner project to GitLab.bash% git tag scanner-final bash% git push --tags
3. Check that everything is properly stored and tagged in your repository. To be sure that you really have updated everything properly, create a brand new, empty directory that is nowhere near your regular working directory, clone the repository into the new location, and verify that everything works as expected. It is really, really, REALLY important that this not be nested anywhere inside your regular, working repository directory. Do this:
Use your group's project code instead ofbash% cd <somewhere-completely-different> bash% git clone git@gitlab.cs.washington.edu:csep501-16wi-students/csep501-16wi-xy.git bash% cd csep501-16wi-xy bash% git checkout scanner-final
xy
, of course. The commands after git clone
switch to the
newly cloned directory, then cause git to switch to the tagged commit you created in step 2, above. We will do the same when we
examine your files for grading.
At this point you should see your project directory. cd
into it, run ant
, run any tests you wish
(something that will be crucial on future assignments).
If there are any problems, erase this newly cloned copy of your repository (rm -rf
)
go back to the regular repository, and fix whatever is wrong.
It may be as simple as running a missed git push --tags
command if the tag was not found in the repository.
If it requires more substantive changes, you may need to do a little voodoo to get rid of the original scanner-final
tag from your repository and re-tag after making your repairs.
To eliminate the scanner-final
tag,
do this (this should not normally be necessary):
Then make and commit your repairs, and repeat the tag and tag push commands from step 2. And then repeat this step to be sure that the updated version is actually correct.bash% git tag -d scanner-final bash% git push origin :refs/tags/scanner-final
Once you are satisfied that the scanner-final
tag in the
repository correctly identifies the finished scanner project you are done.
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX
Comments to adminanchor