CSE P 501 Au11 Homework 1 -
Regular Expressions
Due: Monday Oct. 10 by 11 pm. Please use the dropbox linked from
the CSE P 501 homework web page to submit your homework online. Any common format like pdf or doc/docx is fine, or you can submit a scanned copy of your work as long as it is legible. We suggest you show
your work to help us award partial credit if appropriate, and
for
TA sanity.
For questions that ask for regular expressions, you should restrict yourself
to the basic operations and abbreviations outlined in the slides: concatenation, |
, *
, +
, ?
, and character classes [...]
or character classes containing
all characters except for specific characters, e.g., [^abc]
.
You can also use abbreviations and be somewhat informal when the meaning is
clear,
e.g., allexceptx
= a | b | c | ... | w | y | z
. You should not use additional regular
expression operators found in languages like perl, python, or ruby, or in unix tools like
grep and
sed. In particular, you cannot use not(
re)
as
a a regular expression that matches all other regular expressions except for re.
You should do this homework individually.
- For each of the following regular expressions, (i) give an example of two
strings that can be generated by the regular expression and two that use
the
same alphabet but cannot be generated, and (ii) give an English description
of the set of strings generated (for example, "all strings consisting
of the word 'cow' followed by 1 or more 'x's and 'o's in any order"),
not just a transliteration of the regular expression operations into English.
- (a|xy)*
- b(oz)+o
- ((ε|0)1)*
- Give regular expressions that will generate the following sets of strings.
- All strings of a's and b's with at least 3 a's.
- All strings of a's and b's where b's only appear in sequences whose
length is a multiple of 2 (a few examples: abba, bbbbabbaaa, and a are in this
set; aba, b, ababa, and abbab are not).
- All strings of lower-case letters that contain the 5 vowels (aeiou)
exactly once and in that order, with all other possible sequences
of lower-case letters before, after, or between the individual vowels.
- (Cooper & Torczon exercise 2 for section 2.2, parts (a) and (b) only. p.
726 1st edition, p. 80 2nd edition). Construct a DFA accepting each of the following languages:
- {w in {a,b}* | w starts with 'a' and contains 'baba' as a substring}
- {w in {0,1}* | w contains '111' as a substring and does not contain '00' as a substring }
- In The C Programming Language (Kernighan and Ritchie), an integer
constant is defined as follows:
An integer constant consisting of a sequence of digits is taken to be
octal if it begins with 0
(digit zero), decimal otherwise. Octal constants do not contain the digits
8 or 9.
A sequence of digits preceded by 0x
or 0X (digit zero) is taken
to be a hexadecimal integer. The hexadecimal digits include a
or A through f
or F with values 10 through
15.
An integer constant may be suffixed with the letter u
or U, to specify that it
is unsigned. It may also be suffixed by the letter l
or L to specify that it
is long.
(a) Write a regular expression that generates C integer constants
as described above. (Of course you can break the solution down into a regular
expression with several named parts if it makes things easier to write and
read -- which it probably will.)
(b) Draw a DFA that recognizes integer constants as defined by your solution
to part (a). You may draw this directly; you don't need to formally trace
through an algorithm for converting a regular expression to a NFA and then
constructing a DFA from that. However, you might find it useful
to do so at least partially.
Hint: You might find it helpful to alternate between designing the DFA and
writing the regular expressions as you work on your solution.
- A commment in C is a sequence of characters /*
... */. Write a set of regular expressions that generate C-style comments.
You can restrict the alphabet to lower-case letters, digits, spaces, newlines
(\n), carriage returns (\r) and the characters *
and /. Also, remember that
in C comments do not nest (i.e., a
*/
marks the end of a comment
no matter how many times /*
appears before it.)
Hint: be careful about
what is included in the ...
between /* and */.