Due: Thursday, October 3 by 11:59 pm. Please
use Gradescope (gradescope.com
,
also linked from the CSE 401/M501 resources web page) to submit
your homework online. Gradescope's web site has several
instructional videos and help pages to outline the process,
and this guide has
specific information about scanning and uploading pdf files
containing assignments (although this guide is now somewhat old, so
you may have better methods of creating pdf files on your phone using
more recent software).
For questions that ask for regular expressions, you must restrict yourself
to the basic operations and abbreviations outlined in the lecture slides: concatenation,
|
, *
, +
, ?
,
and character classes like [abc]
, [ax-z]
,
or [a-zA-Z]
.
You may also specify character classes containing
all characters except for specific characters, e.g., [^abc]
means all characters in the entire alphabet except for a
,
b
, and c
;
and [^ax-z]
means all characters in the entire alphabet except
for a
, x
,
y
, and z
.
You can also use abbreviations and be somewhat informal when the meaning of the full
regular expression defined by an abbreviation is clear and unambiguous,
e.g., allexceptx = a|b|c|...|w|y|z
.
Note that you cannot use something like allexceptx
in a solution without first defining
it as a name for a particular regular expression, as in this example.
You may not use additional "regular expression" operations found in libraries
or languages like perl, python, java, javascript, ruby, etc.,
or in unix tools like grep and
sed. In particular, you cannot use not(
re)
as
a regular expression that matches all other regular expressions except for re.
Sometimes a regular expression might need to include a literal character like
*
or +
, which is also a regular expression operator.
Use some simple convention to distinguish between the two. We suggest underlining
literal characters, like * to distinguish them from operators like *.
Please do not use backslash characters as escape characters, they are
\v\e\r\y\ \h\a\r\d\ \t\o\ \r\e\a\d.
When drawing DFAs, you should not include an explicit "error state" or show transitions to this state for situations where the DFA rejects its input. We will assume that if there is no transition defined from a state on a particular input character then it corresponds to a transition to an error state, and the input is rejected. Diagrams are much simpler if we do not include this extra state and its many, many incoming transitions.
To help make DFA diagrams more readable, it's fine to label a transition arrow from one state to another with more than one symbol from the alphabet, instead of drawing multiple transition arrows each labeled with a single symbol. The easiest way to do this is to list the symbols in brackets, like [0-9] to label a transition on a single decimal digit , or [abcxyz] for a transition on one of those six characters. However, remember that transitions correspond to matching a single character in the input (or no character if it is an ε-transition), and cannot be labeled with a more complex regular expression.
Feel free to submit clear, hand-written diagrams for DFAs. However, if you want to use computer diagraming tools and don't already have a favorite, here are a couple of suggestions that students have found useful in the past:
You should do this homework individually.
a
|xy
)*b
(oz
)+o
0
)1
)*(Note: this description is not quite precise about the rules for possible suffixes: is more than one ofAn integer constant consisting of a sequence of digits is taken to be octal if it begins with
0
(digit zero), decimal otherwise. Octal constants do not contain the digits8
or9
. A sequence of digits preceded by0x
or0X
(digit zero) is taken to be a hexadecimal integer. The hexadecimal digits includea
orA
throughf
orF
with values 10 through 15.An integer constant may be suffixed with the letter
u
orU
, to specify that it is unsigned. It may also be suffixed by the letterl
orL
to specify that it is long.
u
or U
allowed, or is there
a restriction on the order of U
and L
, etc.?
The C language standard specifies
that at most one of u
or U
can appear,
at most one of l
or L
,
and the suffix letters can appear in any order.)
<!--
and ends with
the separate 3-character sequence -->
is treated
as a comment, and a comment can span multiple lines. Examples:
<!-- this is a comment --> this is not a comment because it isn't surrounded by comment brackets <!-- but this is a multi-line comment -->
-
, !
,
<
, and >
. Remember that
these comments do not nest (i.e., the sequence -->
marks the end of a comment
no matter how many times <!--
appears before it).
Any of the characters -
, !
, <
,
and >
can appear inside a comment in any combination,
except for the three character
sequence -->
that terminates a comment. That three-character
sequence cannot appear inside a comment since it would terminate it.<!--
and -->
.
As with the previous problem, you may find it helpful to alternate between the
regular expressions and DFA as you work on the problem.
<!--
to appear in the middle of a comment. We omitted that restriction here to simplify
the problem.