CSE 413 14au Assignment 7 - Regular Expressions & Scanner
Due: Online via the Catalyst Dropbox by 11 pm, Tuesday, November 25, 2014.
Part I. Written problems
You should answer these questions using regular expressions as described in class, not the variants found in programming language like Ruby, Perl, Python, Bash, grep, sed, or whatever.- For each of the following regular expressions, (i) give an
example of two strings that can be generated by the regular
expression and two that use the same alphabet but cannot be
generated, and (ii) give an English description of the set of
strings generated (for example, "all strings consisting of
the word 'cow' followed by 1 or more occurrences of 'milk' ").
For (ii), you should not just paraphrase the regular expression operators
in English; describe the sets of strings generated.
- (a|xy)*
- b(oz)+o
- ((ε|0)1)*
- Give regular expressions or sets of regular expressions that will generate
the following sets of strings.
- All strings of a's and b's with at least 3 a's.
- All strings of a's and b's where b's only appear in sequences of b's whose length is a multiple of 2 (i.e., abbaa, bbbbabbaaa, and a are in this set; aba, b, bab, and abbabab are not).
- All strings of lower-case letters that contain the 5 vowels (aeiou)
exactly once and in that order, with all other possible sequences
of letters before, after, or in between the individual vowels. Your answer only needs to use lower-case letters (a-z) and need not include upper-case ones (A-Z).
- In The C Programming Language (Kernighan and Ritchie), an
integer constant is defined as follows.
An integer constant consisting of a sequence of digits is taken to be octal if it begins with
0
(digit zero), decimal otherwise. Octal constants do not contain the digits8
or9
. A sequence of digits preceded by0x
or0X
(digit zero) is taken to be a hexadecimal integer. The hexadecimal digits includea
orA
throughf
orF
with values 10 through 15.An integer constant may be suffixed with the letter
u
orU
, to specify that it is unsigned. It may also be suffixed by the letterl
orL
to specify that it is long.
Part II. Programming - Calculator Scanner
This is the first of two programming assignments to build an interpreter for the language given in the Calculator Language description. We will build the interpreter in two parts - a scanner that reads the calculator program from the input stream and breaks the input into tokens, and a parser/evaluator that parses the token stream according to the specifications in the grammar and executes the program. The calculator program should be implemented in Ruby. For the most part it will just be a collection of top-level functions, but you should create classes when these are helpful in organizing the code.
For this assignment you should implement a scanner that provides a
function next_token
. Each time next_token
is
called it should return a new Token
object that describes the
next terminal symbol read from the input. Objects of class Token
should
respond to the following messages:
kind
- return the lexical class of the token as a string. This should be a distinct string for each lexical class in the program, possibly just the operator or keyword itself. However, all identifiers should be treated as instances of a single lexical class and thekind
method should return the same value for every identifier. Similarly, all numbers should be treated as a single lexical class. You will also want to have a lexical class to represent the end of an input line, since end-of-line is semantically meaningful - it indicates the end of a statement.value
- if the tokenkind
is either an identifier or number, then this message should return the actual identifier or floating-point value. Its value is not defined for other lexical classes.to_s
- the standard Ruby "to string" method. This should produce a descriptive string representation of the token, including the associated value if the token is an identifier or a number.
To test the scanner, you should write a small program that calls next_token
repeatedly
to get the next token from the input and prints the result (the result of sending to_s
to the token object). After reading and
printing a quit
or exit
token, the test program should stop.
Feel free to take advantage of Ruby's string and regular expression classes and methods to chop the input into tokens.
Your code should be contained in a file scan.rb
. Be sure to include your name and other identifying information as comments
at the beginning of your file. There should also be descriptive comments as needed; in particular,
your Token
class should include documentation of the possible values returned
by the
kind
method.
What to Hand In
Turn in a PDF file named hw7.pdf
containing your answers to the questions from Part I, your scan.rb
source file for part II, and some
examples of test input and output in a file test.rb
that demonstrate that your scanner works on a variety of test input containing both legal tokens and other input characters.
The PDF file for part I can be a scanned document if that is convenient, as long as it is clear and legible and does not exceed a few MB in size..
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX
Comments to adminanchor