Due: Thursday, October 5 by 11 pm. Please
use Gradescope (gradescope.com
,
also linked from the CSE 401/M501 resources web page) to submit
your homework online. Gradescope's web site has several
instructional videos and help pages to outline the process,
and this guide has
specific information about scanning and uploading pdf files
containing assignments (although this guide is now somewhat old, so
you may have better methods of creating pdf files on your phone using
more recent software).
For questions that ask for regular expressions, you must restrict yourself
to the basic operations and abbreviations outlined in the lecture slides: concatenation,
|
, *
, +
, ?
,
and character classes [...]
.
You may also specify character classes containing
all characters except for specific characters, e.g., [^abc]
means all characters in the entire alphabet except for a
,
b
, and c
.
You can also use abbreviations and be somewhat informal when the meaning of the full regular expression
it abbreviates is clear and unambiguous,
e.g., allexceptx = a|b|c|...|w|y|z
.
Note that you cannot use something like "allexceptx" in a solution without first defining
it as a name for a particular regular expression, as in the above example.
You may not use additional "regular expression" operations found in libraries
or languages like perl, python, java, javascript, ruby, etc.,
or in unix tools like grep and
sed. In particular, you cannot use not(
re)
as
a a regular expression that matches all other regular expressions except for re.
Sometimes a regular expression might need to include a literal character like
*
or +
, which is also a regular expression operator.
Use some simple convention to distinguish between the two. We suggest underlining
literal characters, like * to distinguish them from operators like *.
Please do not use backslash characters as escape characters, they are
\v\e\r\y\h\a\r\d\t\o\r\e\a\d.
When drawing DFAs, you should not include an explicit "error state" or show transitions to this state for situations where the DFA rejects its input. We will assume that if there is no transition defined from a state on a particular input character then it corresponds to a transition to an error state, and the input is rejected. Diagrams are much simpler if we do not include this extra state and its many, many incoming transitions.
Feel free to submit clear, hand-written diagrams for DFAs. However, if you want to use computer diagraming tools, here are a couple of suggestions if you don't already have a favorite:
You should do this homework individually.
a
|xy
)*b
(oz
)+o
0
)1
)*(a) Write a regular expression that generates C integer constants as described above. (Of course you can break the solution down into a regular expression with several named parts if it makes things easier to write and read.)An integer constant consisting of a sequence of digits is taken to be octal if it begins with
0
(digit zero), decimal otherwise. Octal constants do not contain the digits8
or9
. A sequence of digits preceded by0x
or0X
(digit zero) is taken to be a hexadecimal integer. The hexadecimal digits includea
orA
throughf
orF
with values 10 through 15.An integer constant may be suffixed with the letter
u
orU
, to specify that it is unsigned. It may also be suffixed by the letterl
orL
to specify that it is long.
<!--
and ends with
the separate 3-character sequence -->
is treated
as a comment, and a comment can span multiple lines. Examples:
<!-- this is a comment --> <!-- and so is this -->
-
, !
,
<
, and >
. Remember that
these comments do not nest (i.e., the sequence -->
marks the end of a comment
no matter how many times <!--
appears before it).
Any of the characters -
, !
, <
,
and >
can appear in a comment in any combination,
except that the three character
sequence -->
terminates the comment.<!--
and -->
.
As with the previous problem, you may find it helpful to alternate between the
regular expressions and DFA as you work on the problem.
<!--
to appear in the middle of a comment. We omitted that restriction here to simplify
the problem.