Due: Thursday, October 10 at 11:59 pm. You will "turn in" your project by pushing it to your GitLab repository and providing a suitable tag. See the end of this writeup for details.
Please note: The following writeup includes specific
requirements for how your scanner (and later parts of the compiler) should work:
options and parameters that must be supported by your compiler program,
location of input data, required output,
required program exit values (return codes),
and other details. Please
be sure that your compiler works as specified, does not produce
extraneous output messages on either the stdout
or stderr
streams, runs on the lab machines using Java 21, and
otherwise behaves predictably so that we can test it. You need to get
these details right on this and on later phases of the project.
The main purpose of this assignment is to construct a scanner for MiniJava. You should use the JFlex scanner generator, which is a Java version of the venerable Lex family of programs. The ideas behind JFlex and the input language it supports are taken directly from Lex and Flex, which are described in most compiler books and have extensive documentation online. Links to JFlex and other tools are available on the main CSE 401/M501 project page. Starter files will be pushed to each project group's Gitlab repository and those files contain a sample project that shows how these tools work together. These programs work with the CUP parser generator, which we will use for the next phase of the project. Although this phase of the project does not use the CUP grammar, it does require specifying input language tokens in the CUP input file. You will need to update those definitions to ones appropriate for the full MiniJava language so they can be used by your scanner, which very likely will require removing a few things provided as part of the starter code along with adding token specifications needed for MiniJava. The JFlex and CUP programs and libraries are included in the starter code.
The other purpose of this assignment is to be sure that all of the infrastructure and the associated tools needed to complete the compiler project are working properly. Please start early so if there are configuration or infrastructure problems they can be fixed well before the assignment is due.
To begin, you (and your partner) should clone your group's GitLab repository containing the starter files. You can find a link to the GitLab web interface on the main CSE 401/M501 resources web page or the compiler project page. You can also find links to git documentation on the project page if you need to refresh your knowledge of git.
Once you have cloned the repository you should explore the starter
code and be sure to carefully look at all of the README
files located
throughout the code to get oriented, especially the overview file
at the top level of the project files.
If you plan to use an IDE like IntelliJ to work on your
project, now is the time to set that up. Be sure to carefully
follow the setup instructions for your IDE (in the IDE-setup-notes
folder). If you do not follow the instructions, your IDE setup can easily
cause later problems -- particularly with files not being updated
or recompiled properly, creating spurious bugs that are very hard
to diagnose and fix.
In other words, don't just click the "clone repo" button in
your IDE and hope for the best. Read the instructions!
You will need to examine the MiniJava source grammar to decide which symbols are terminals and which are non-terminals (hint: be sure to include operators, brackets, and other punctuation -- but not comments and whitespace -- in the set of terminal symbols). Also, be sure to review the MiniJava project description and be sure you understand the scope of the language and project. Note that the MiniJava grammar treats several things as reserved words even though these are not reserved in full Java. Examples include the constants "true", "false", "main" and other literal strings that appear in double quotes in the MiniJava Grammar.
The starter code contains a DemoScanner
program that reads a file from
standard input and prints a readable representation of the tokens in
that file to standard output. You can run it with the command ant demo-scanner
,
or the equivalent command from inside a programming environment like IntelliJ.
This demo program is intended to show
how to use a JFlex scanner and how it works with the rest of the toolchain.
But for the compiler itself you should create a more
appropriate main program
and you will need to create an appropriate set of tokens for MiniJava.
Once you have updated the set of tokens for MiniJava, it is entirely possible
that the starter code demo program may no longer work if parts of it are
not compatible with the final MiniJava token grammar.
That is expected and does not represent an error.
You should create a Java class named MiniJava
with a
main
method that controls execution of your compiler.
This method should examine its arguments (the String
array parameter
that is found in every Java
main
method) to discover compiler options and the name of the file to be
compiled. When this method is executed using the
command
java MiniJava -S filename.javathe compiler should open the named input file and read tokens from it by calling the scanner repeatedly until the end of the input file is reached. The tokens should be printed on standard output (Java's
System.out
) using a format similar to the one
produced by the DemoScanner
program in the starter code.
When your compiler (just the scanner at this point) terminates,
it must return an "exit" or status code indicating whether any
errors were discovered and reported (error messages written to stderr
)
when compiling the input program. In Java, the
method call System.exit(
status)
terminates the program with the given status. The status value
should be 0 (normal termination) if no errors were discovered. If
the scanner detects any errors (invalid characters that do not form proper MiniJava tokens
in the input program, input file not found or cannot be opened, or something else),
the exit status value should be 1.
Do not use additional or different exit status values.
Note: The scanner and parser demo programs in the starter code
read their input from stdin
. Your compiler must read input
from the file named on the java
command, so you will need
to include appropriate code in your MiniJava
main program
to open that file and prepare it for reading.
The source code for MiniJava.java
should be in
the top-level project src
folder,
and ant
will compile it automatically along with
all of the other project files when needed.
The actual details of running MiniJava
's
main
method from a
command prompt are a bit more complicated, because the Java
virtual machine needs to know where the compiled classes and
libraries are located. The following commands should recompile
any necessary files and run the scanner when they are executed in the
top-level directory containing the build.xml
ant file:
ant java -cp build/classes:lib/java-cup-11b.jar MiniJava -S filename.javaIf you set the
CLASSPATH
environment variable to
point to the library jar
file and compiled
classes directory, you should not need to provide
the -cp
argument on the java
command.
If you are using a Windows terminal instead of a mac or linux terminal,
you will need to use ;
instead of :
as a path separator in the -cp
option,
and then you need to quote the whole class path to keep windows from treating
the ;
as a separator between two commands. So on Windows you need to
do this:
ant java -cp "build/classes;lib/java-cup-11b.jar" MiniJava -S filename.java
The build.xml
file processed by ant
already contains options to specify the class path, which is
why you didn't have to specify those things to run targets
like demo-scanner
when you use ant
. You
should feel free to add similar targets to build.xml
to run
your MiniJava
compiler or other test programs
using ant
, and you can use
additional ant
options in build.xml
to specify program arguments like -S
.
However, it must be possible to build and execute your compiler using
the commands shown above, i.e., use ant
with no options to
build the compiler then use a java
command with the correct
options to execute the compiler, without relying on additional
targets you may have added to the build.xml
file for your
use while developing and testing your code.
To test your scanner, you should create a variety of input files,
including some that contain legal MiniJava programs and others that contain
programs with lexical errors, and others containing
random input. We have provided one possible suggestion for how to organize
your test files using JUnit in the starter code test
subdirectory,
but you are not
required to follow it (or even to use JUnit).
There is a single example test to get you started,
but it is designed for the small demo language in the starter code, so you will need to modify
it to work with MiniJava and your choice of token names. To get an understanding for how
the JUnit setup works and how you can expand on it, read the documentation
at test/README.txt
and look at the bottom of the build.xml
file for the sample ant testing target (test
).
The sample test setup in the starter code is somewhat limited. We have supplied an optional set of testing utility programs that is more sophisticated and that students in previous quarters have found useful for testing later phases of the compiler project as well as the scanner. We suggest that you take a careful look at this package and strongly consider using it, starting with this initial part of the project.
Feel free to arrange and run your tests however you'd like -- but keep in mind that it is nearly impossible to find all the edge cases in something as complex as a compiler without an organized and thorough approach to testing.
Be sure your scanner does something reasonable if it
encounters junk in the input file. Crashing, halting immediately
on the first error, or infinite looping is not reasonable.
Complaining, skipping the junk, and moving on to find the next token in the
input file is the right approach.
A correct scanner for any compiler should be able to resume scanning and
continue to deliver tokens on request until reaching the end of the input file, no
matter what it encounters in the file along the way.
Characters that do not form
valid tokens should be reported as errors by the scanner and skipped
to continue looking for valid tokens.
Error messages should be written to stderr
(the Linux standard
error stream, which is the System.err
stream in Java).
The starter jflex code already
contains code to handle unrecognized input characters
and report them as an error by writing messages to stderr
, so you
may not have to do much additional work to get this right.
The jflex code also returns a special "error" token to the client
(the parser), which communicates information about scanner errors
to the parser. We are not using this error handling token in our project,
but you should leave that starter code in place and not alter it.
If your scanner needs to report additional errors, appropriate
messages should be
written to stderr
.
The scanner should not return any additional "error" or "comment" tokens
to the parser.
Hint: in the past, comment handling has proved to be a bit tricky.
Remember that the scanner needs to recognize comments and skip them,
not passing them on to the parser. Comments that start with //
and extend to the end of a line are not particularly hard. But comments that
start with /*
can be more difficult, since they can include embedded
newlines as well as *
and /
characters as long as they
don't contain the sequence */
, which terminates a comment. Be sure your
scanner can correctly detect and skip tricky comments like /***/
,
/*/*/
, /*/*****/
and other devious things,
possibly with embedded newlines, returns, or other characters
in the middle as well as other complications.
Remember, it is up to the parser to decide if the sequence of tokens in the input make up a well-formed MiniJava program; the scanner's job is simply to deliver the next token whenever it is called, regardless of whether the sequence of tokens extracted from the input actually forms a legal MiniJava program.
This assignment only asks you to implement the scanner part of the project. The parser, abstract syntax trees, and CUP grammar rules will come in the next part, and we strongly advise that you not try to "get ahead" by implementing anything further at this time, other than minimal changes needed to the starter code so the scanner will build and execute properly.
Your group should use your CSE 401/M501 GitLab repository to store the code for this and remaining parts of the compiler project.
The main items we will examine for this phase of the project
are your JFlex and CUP specification files, your MiniJava
class and main program,
and your test input files.
Include example input source files that demonstrate the abilities
of your scanner, including at least one with an error in the middle
of the file. You should not include the intermediate file(s)
produced by JFlex or CUP -- machine generated code is
generally unenlightening, consisting of a bunch of tables and
uncommented code, if it is readable at all.
For the same reason, these generated files produced by JFlex and CUP
and compiler output like .class
or .o
files should not be pushed
to the repository.
We suggest you run ant clean
to delete generated files before you add,
commit, and push new files to your gitlab repository.
You should also create and include a brief scanner-notes.txt
file describing
how you and your partner managed the work for this part of the project.
You should describe how the work was organized (i.e., did you split up the work,
and, if so, how, or did you work together on the entire project), how did
you and your partner coordinate the work,
and how much of the work was done by each partner.
There are no correct answers to these questions,
but it is useful to reflect on how the effort was organized and perhaps
think about how successful that was and whether you want or need to make changes for later
parts of the project.
Your note should be brief and to the point - no need to write a long essay.
A few sentences or bullet points should be enough.
Place this file in the Notes/
top-level directory of your
project, and commit/push this file to your repo along with the rest of your code.
We will test your code on the lab Linux machines (attu or
equivalent) and your project should build and
work properly when run with ant
and using Java 21.
Since this early stage of the project only includes Java
code, if it runs properly with ant
and Java 21 in
a Windows, Mac, or other environment, it should be ok.
If you are using a programming environment like IntelliJ,
set the project language level option(s) to Java 21 so the IDE will
properly diagnose unavailable language features, even if the
installed Java version on your machine is something more recent.
Once you're done, "turning in" the assignment is simple: create an appropriate tag in your git repository to designate the revision (commit) that the course staff should examine for grading. But there are multiple ways to get this wrong, so you should carefully follow the following steps, particularly if you are new to git. If you have a lot of git experience, our apologies for perhaps belaboring the obvious, but we want to be sure that assignments get pushed and tagged properly and without leaving git repositories in strange states. If you are not using Linux or another Linux-based command-line environment, please do the moral equivalent of the following on your system.
The idea is:
1. Tidy up and be sure everything is properly
committed. Commit and push all of your changes to your
repository (see the main project web page for links
to git
information if you need a refresher on how
to do this). Then in your main project directory:
If you see any messages about uncommitted changes or any other indications that the latest version of your code has not been pushed to the GitLab repository, fix those problems and push any unsaved changes before going on. Then repeat the above steps to verify that all is well.bash% git pull bash% ant clean bash% git status On branch master Your branch is up-to-date with 'origin/master'. nothing to commit, working directory clean
2. After committing and pushing your code to your gitlab repository, add a tag to your repository and push the tag information to GitLab to indicate that the current commit is the version of the scanner that you are submitting for grading:
Do not do this until after you have pushed all parts of your scanner project to GitLab.bash% git tag scanner-final bash% git push --tags
3. Check that everything is properly stored and tagged in your repository. To be sure that you really have updated everything properly, create a brand new, empty directory that is nowhere near your regular working directory, clone the repository into the new location, and verify that everything works as expected. It is really, really, REALLY important that this not be nested anywhere inside your regular, working repository directory. Do this:
Use your group's project code instead ofbash% cd <somewhere-completely-different> bash% git clone git@gitlab.cs.washington.edu:cse401-24au-students/cse401-24au-xy.git bash% cd cse401-24au-xy bash% git checkout scanner-final
xy
, of course.
The commands after git clone
change to the
newly cloned directory, then cause git to switch to the tagged commit you created in step 2, above.
We will do the same when we examine your files for grading.
At this point you should see your project directory.
Run ant
to build the project and run any tests you wish
(something that will be even more essential on future assignments).
If there are any problems, erase this newly cloned copy of your repository
(rm -rf cse401-24au-xy
),
go back to your regular working repository copy,
and fix whatever is wrong.
DO NOT do any additional work in the copy of the repository
that you have cloned to verify your work. After the git checkout scanner-final
command, that repository is in a "detached head" state,
and any changes you make to that copy of the
repository will either be lost or pushed to a hidden branch
or otherwise cause problems. DON'T do any repairs or further work using that copy.
The necessary changes in the original repo
may be as simple as running a missed git push --tags
command if the tag was not found in the repository.
If it requires more substantive changes, you may need to do a little voodoo
to get rid of the original scanner-final
tag from your repository and re-tag after making your repairs.
To eliminate the scanner-final
tag,
do this (this should not normally be necessary):
Then make, commit, and push your repairs, and repeat the tag and tag push commands from step 2. And then repeat this step to be sure that the updated version is actually correct.bash% git tag -d scanner-final bash% git push origin :refs/tags/scanner-final
Once you are satisfied that the scanner-final
tag in the
repository correctly identifies the finished scanner project you are done.