CSE 401 Wi10 Homework 1 - Regular Expressions

Due: Friday, January 15. You can either hand in your assignment at the beginning of class that day, or you can turn it in online by 6 pm (a scanned copy is fine). We suggest you show your work to help us award partial credit if appropriate, and for TA sanity.

For questions that ask for regular expressions, you should restrict yourself to the basic operations and abbreviations outlined in the slides: concatenation, |, *, +, ?, and character classes [...] or character classes containing all characters except for specific characters, e.g., [^abc]. You can also use abbreviations and be somewhat informal when the meaning is clear, e.g., notx = a | b | c | ... | w | y | z. You should not use additional regular expression operators found in languages like perl or ruby, or unix tools like grep and sed. In particular, you cannot use not(re) as a a regular expression that matches all other regular expressions except for re.

You should do this homework individually.

(Note that there are four questions - the last one may be on the back of the page if you use double-sided printing.)

  1. Cooper & Torczon exercise 2 for section 2.2, parts (a) and (b) only (p. 726).

  2. Give regular expressions that generate the following sets of strings.
    1. All strings of a's and b's with at least 3 a's.
    2. All strings of a's and b's where b's only appear in sequences whose length is a multiple of 2 (i.e., abba, bbbbabbaaa, and a are in this set; aba, b, ababa, and abbab are not).
    3. All strings of lower-case letters that contain the 5 vowels (aeiou) exactly once and in that order, with all other possible sequences of lower-case letters before, after, or in between the individual vowels.
  3. In The C Programming Language (Kernighan and Ritchie), an integer constant is defined as follows:

    An integer constant consisting of a sequence of digits is taken to be octal if it begins with 0 (digit zero), decimal otherwise. Octal constants do not contain the digits 8 or 9. A sequence of digits preceded by 0x or 0X (digit zero) is taken to be a hexadecimal integer. The hexadecimal digits include a or A through f or F with values 10 through 15.

    An integer constant may be suffixed with the letter u or U, to specify that it is unsigned. It may also be suffixed by the letter l or L to specify that it is long.

    (a) Write a regular expression that generate C integer constants as described above. (Of course you can break the solution down into a regular expression with several named parts if it makes things easier to write and read -- which it probably will.)

    (b) Draw a DFA that recognizes integer constants as defined by your solution to part (a). You may draw this directly; you don't need to formally trace through the algorithm for converting a regular expression to a NFA and then minimizing that to get a DFA. However, you might find it a useful review to do so.

    Hint: You might find it helpful to alternate between designing the DFA and writing the regular expressions as you work on your solution.
  4. A commment in C is a sequence of characters /* ... */. Write a set of regular expressions that generate C-style comments. You can restrict the alphabet to lower-case letters, digits, spaces, newlines (\n), carriage returns (\r) and the characters * and /. Also, remember that in C, comments do not nest (i.e., a */ marks the end of a comment no matter how many times /* appears before it.)

    Hint: be careful about what is included in the ... between /* and */.