CSE 142 Spring 2002
Finite State Automata

This is a brief description of Finite State Automata (FSA's) as used in the class this quarter.  This is by no means a complete explanation of the topic.  However, it is a complete explanation of everything we expect you to understand (say, for the final).

States and Transitions

An FSA consists of a (finite) set of states and a set of labeled transitions between states.  (In the diagrams below, the circles are the states and the arrows are the transitions.)  An FSA has a single start state and one or more accepting states (explained below).  In our diagrams, the start state is the sole state that has no transitions coming into it.  (The arrowheads indicate the direction of transitions).  The accepting states are those that have numbers in them.

At any moment in time, an FSA is in a single state. It starts in the start state. As it is presented with input tokens, it takes the transition out of the current state whose label matches the input token.  If this were a printed piece of paper lying on the table in front of you, you could use a penny to represent the current state.  Initially, the penny would be placed so it covered the start state.  As you read each input token, you move the token along the transition whose label matches that input token, so that it covers the destination state of that transition.  This process continues until either:

If the FSA reaches an accepting state, we say that it accepts the sequence of input tokens that got it from the start state to the accepting state it reached.  If it encounters an input token that does not match any transition out of the current state, it is an error, and the FSA rejects the sequence of tokens it has been presented.  (The FSA then stops looking at input tokens, at least until it is directed to reinitialize itself to the start state and begin again.)  The example FSA above accepts the two inputs "I like candy" and "I hate peas", and rejects all other input sequences.

Any set of strings is called a language.  We say that an FSA recognizes (or is a recognizer for) the set of strings that it accepts.  The FSA above recognizes the language {"I like candy", "I hate peas"}.

In the example above, each transition is labeled with a literal string value.  We also allow labels that indicate a type of string.  For instance, we use "<int>" to mean "any string that is a valid representation of an integer literal" (that is, any string consisting of only the digits, '0' through '9').  We use the transition label "<string>" to mean any string whatsoever.  This allows us to give simple FSAs that represent more complicated languages.  For instance, here is an FSA that recognizes the PL142 language:

 

How We've Used FSAs

We've used FSAs as a convenient way to recognize command languages - the set of valid commands that the program should understand. To use one this way we set it in its start state and then present it with each input token, one by one.  If an input token causes the sequence of tokens to be rejected, the FSA indicates that an input error has occurred (e.g., it throws an Exception).  If an input token causes the FSA to reach an accepting state, it indicates that to whoever invoked it.  If neither of these apply, it simply consumes the input token and moves to the next state.

Why is this convenient?  The code for a program that implements any FSA of the type we've described is simpler than the ad hoc code one is apt to write to recognize any non-trivial language.  The FSA program simply keeps track of what state it is in.  Associated with each state is an ArrayList of transitions.  As the FSA is presented with input tokens it compares them against the label of each transition of the current state.  If there's a match, the FSA is moved to the next state.  Otherwise, it's an error.  No matter how complicated the language is, the code is simple: compare an input token against an ArrayList of transition labels.

The FSA itself only verifies the correctness of the syntax of the input; it cannot check the semantics.  For instance, the PL142 recognizer will accept "n;" whether or not any variable n has been declared in the PL142 program.  To be useful, then, the FSA must supply information to another part of the program that checks the semantics and then performs whatever action was requested.

The FSA aids the implementation of the semantics in two ways.  First, it can provide the integer associated with the accepting state it reached.  This tells the program what kind of sentence was recognized.  For instance, the PL142 recognizer says 0 for declarations, 1 for print statements, and 2 for assignment statements.  Additionally, as the FSA is accepting input tokens it adds them to an ArrayList.  When an input is accepted, that ArrayList contains the entire sequence of input tokens that led from the start state to the accepting state, which is then used by the semantics portion of the application.  For instance, accepting state 2 would be reached by both "n = n + 1;" and "n = 0 + 0;".  It's important for the application to know which has actually occurred.

What You Need to Know and Be Able to Do for the Final Exam

You should understand the construction and operation of FSAs to the level of this description.  The idea of what they do is more important than any particular implementation details - for instance, it doesn't matter whether they throw Exceptions or return some error code when the input is rejected, it just matters that you understand that they operate token by token, and respond with "accept", "reject", or "ok" at each step.  (We will definitely not be asking questions about the particular FSA implementation in the GraphicsComposer application.)

With that in mind, reasonable exam questions might be "Draw an FSA that recognizes the following language" or "What language does the following FSA recognize?". 

NOT REQUIRED -  For the Interested

FSAs cannot recognize all languages.  For instance, they cannot recognize valid Java integer expressions.  Why not?  Consider expressions of this form:

((((....(n)+n)+n)+...+n)

where there is an arbitrary number of occurrences of "+n)".  To be a valid Java expression, the number of left parentheses must equal the number of right parentheses.  It is impossible to keep track of this using only a finite number of states, and so an FSA is not sufficient. (So,  more powerful recognizers than FSAs are used to compile Java.)

It turns out that FSAs can recognize exactly the set of languages that can be expressed by regular expressions.  It is beyond the scope of this beyond-the-scope section to describe what they are, but some of you may have encountered them - they are commonly used in editors to specify patterns to be matched for string substitution.  For instance, "(.*)" means (to most editors that implement regular expressions) "all strings starting with a '(' and ending with a ')'" (because '.' means "any character" and '*' means "zero or more occurrences of the preceding character").