CSE 413 Winter 2002

Assignment 5 --
Regular Expressions & Compiler I/O

Electronic turnin of programming part due Tuesday, Feb. 19, by 10:00 pm.
Turnin receipt, test output, and written problems due in class Wednesday, Feb. 20.

Overview

This is the first of a sequence of assignments that will produce a small, but complete compiler.  You are free to work with a partner on this project - in fact, we encourage you to do so.  However, if you do work with a partner, you should plan on working with the same person for the entire compiler project.

Written problems

Here are a few problems involving regular expressions and finite automata.  These are a warmup for the next part of the compiler, the scanner.  Do these problems individually, even if you have a partner for the programming part. 

  1. For each of the following regular expressions, (i) give an example of two strings that can be generated by the expression and two that cannot be generated but use the same alphabet (if possible), and (ii) give an English description of the set of strings generated (for example, "all strings consisting of 1 or more w's followed by xyz").
    1. (0*1*)*
    2. (1|10)*
    3. (0|1)*0(0|1)(0|1)
  2. Give regular expressions that will generate the following strings.
    1. All strings of lower case letters in the range a-f where the letters are in ascending lexicographic order.
    2. All strings of 0's and 1's that do not contain 011 as a substring.
  3. Here is the description of floating constants from the C reference manual (by Kernighan and Ritche). Write a set of regular expressions that generate floating constants as described here.
    A floating constant consists of an integer part, a decimal point, a fraction part, an e or E, an optionally signed integer exponent and an optional type suffix, one of f, F, l, or L.  The integer and fraction parts both consist of a sequence of digits.  Either the integer part or the fractional part (not both) may be missing; either the decimal point or the e and the exponent (not both) may be missing.  The type is determined by the suffix; F or f makes it float, L or l makes it long double; otherwise it is double.
  4. (Optional) Give the diagram of a finite automata (state diagram) that recognizes the regular expressions for C floating constants given in your answer to question 3.

Programming project - Compiler I/O

For this part of the assignment, implement and test a Java class CompilerIO that will handle input and output for the compiler.  Here is a skeleton for this class.  Some details are given below.

public class CompilerIO {

   // Constructors:
   
   /** Open input and output files whose names are given as arguments */
   public CompilerIO(String inFile, String outFile) { ... }
   
   /** Open named input file.  Open an output file with the same name,
    *  but with an extension of ".asm" */
   public CompilerIO(String inFile) { ... }
   
   // This one is optional:
   /** Use a FileDialog to select an input file and open it.  Open an
    *  output file with the same but with an extension of ".asm" */
   public CompilerIO() { ... }
   
   // Methods:
   
   /** Return the next line from the input file or null if no more input
    *  is available */
   public String readLine() { ... }
   
   /** print outputLine to output file */
   public void println(String outputLine) { ... }
   
   // properties and set/get methods to access their values:
   
   private boolean echoing;	// true when automatically copying
                                // input to output
   public  void    setEchoing(boolean val) { ...}
   public  boolean getEchoing() { ... }
   
   private String echoPrefix;	// string to concatenate to front of
                                // lines automatically copied to output
   public  void   setEchoPrefix(String val) { ... }
   public  String getEchoPrefix() { ... }
   
   // additional private instance variables and methods as needed
   ...
   
   // Test program
   // create an CompilerIO object and test it.  Main should accept
   // 1 or 2 filename arguments, and, if the optional 0-argument
   // constructor is implemented, 0 arguments.
   public static void main(string [] args) { ... }
}

The idea behind this class is that the compiler will use a single CompilerIO object to manage input and output, which will shield the rest of the compiler from the details of file handling.  The input and output files are ordinary text files.

When the compiler starts, it will create an instance of CompilerIO, and the CompilerIO object's constructor should open the input and output files.  The scanner will then use this object's readLine() method to read the source program text.  The code generation parts of the compiler will use println() to write the generated x86 assembly code to the output file.  CompilerIO.readLine() should handle any exceptions generated during input so the rest of the compiler doesn't see them; otherwise its interface should be the same as BufferedReader.readLine().

In addition to the generated x86 code, we will want to include other lines in the output file as assembly-language comments to make the output easier to read.  So, besides just reading the input file, CompilerIO.readLine() should be able to automatically print each input line to the output file as it is read, if requested by the user.

Variables echoing and echoPrefix should control whether the input lines are printed to the output file, and how.  If echoing is true, then each input line should be written to the output file at the time it is read, with the echoPrefix string concatenated to the front of it.  The idea behind echoPrefix is that we can set it to something appropriate so the echoed source program lines copied to the assembly language output file are treated as comments in the generated x86 program (more details in a future assignment).  The set and get methods for these properties can be used to turn echoing on and off and change the prefix string.

CompilerIO is required to have two constructors; a third is optional.  The two-argument constructor parameters are the names of the input and output files. The single-argument constructor should treat its argument as the name of the input file.  It should create the output file name by taking the input file name (say test.txt) and replacing the extension with ".asm" (test.asm).  Don't make assumptions about the length of the extension in the input file name.  It might be ".txt", but it might also be ".stuff", or ".d" or something else.  (Implementation hint: class String includes methods that locate the first or last occurrence of a character or substring in a String.)

The third, optional constructor should have no arguments.  It should let the user to pick the input file with a regular file dialog box (see class JFiledialog), open that file, then create a filename and open an output file just like the 1-argument constructor. 

Class CompilerIO should include its own test program (method main).  The test program should accept one or two file names as command line arguments and use the appropriate constructors to create an CompilerIO object.  If you implemented the optional 0-argument constructor, then your test program should also be runnable with no arguments, and it should call the 0-argument constructor to open the files in that case.  Once the files are open, the test program should verify that the input and output methods of CompilerIO work properly, both with and without automatic echoing of input lines, and with different values of the echoPrefix when lines are echoed..

Project Turnin

Use this turnin page to submit your homework.

You should hand in some examples of test input and output that demonstrate that your class works, and your answers to the written problems in lecture on February 20.