Turing intro - Computer Science



Chapter 14

The finite state control structure

Analogy

How are you feeling right now? Maybe you are happy or sleepy, dopey or grumpy, hungry or angry. Maybe you feel several of these at once, but for simplicity, let’s assume that just one word describes your current state. How did you come to be in this state? Clearly your state is affected by things that happen to you — the inputs you receive. But input alone does not determine how you feel. For example, receiving a 90% on an exam can leave you either delighted or disappointed, depending on your expectations. Hence, your current state is really a function of two factors: your state a little while ago, and the input you have received since then. In other words, your state at time t+1 is determined by your state at time t and the input you encountered between t and t+1.

The notion of state and of transition from one state to another as determined by a combination of previous state and input is the basis for a set of computational models called finite state machines (FSMs). The FSM models in turn leads us in a very natural way to a powerful and broadly applicable programming strategy which we will examine in this chapter.

Introduction

Computational models, or models of computation, are abstractions of devices that produce outputs (answers) from inputs (data). For simplicity, we'll assume a basic model of a computational device as a black box that has a single input channel and a single output channel. One simple physical form of a such a computational device would be a black box with a set of buttons on the front, exactly one of which can be pressed at any time, and a set of lights on the top, exactly one of which is lit at any time. An input value is specified by pressing one of the buttons; the output value is specified by the single light that is lit. We can easily think of this device as producing a single output (light) in response to a single input (button). But we can also think of the device as being given a sequence of inputs (by punching one button after another) and producing a sequence of outputs (by one light after another being lit). The most recent input is often referred to as the current input, and the light currently lit is the current output. Note that what goes on inside the box can be very complicated or even random; this simple model can be modified to accommodate any computational task we like. But we are interested only in a small set of the possible behaviors of the box, and we are interested only in behaviors that are deterministic.

The simplest computational model is one in which the box computes a function of the input value. Such a box will produce an output value (turn on a light) solely on the basis of the current input value (the last button pressed). Because the output value is determined only by the most recent input value, we say that the computation requires no memory; this means that no information about previous inputs has to be stored to perform the computation that determines which light shall be lit. Furthermore, given a particular input, say the ith button, the output is always the same.

We are interested in a more complex computational model that uses some information about past inputs in determining its output. Conceptually, we can imagine a device that keeps track of every input it has processed, that is, the entire input history. Simpler devices might keep track of less information, such as how many times the leftmost button was pressed, but even this information is unbounded in the sense that the number that must be stored may become arbitrarily large, and require arbitrarily much storage to represent it. We're interested in a simpler class of machines called Finite State Machines (FSM), which can store only a finite amount of information, and give outputs that depend on that information and the input.

A Finite State Machine is a device that can be in any one of a specified finite set of states and which can change state as a result of an input. Thus, each time an FSM gets an input, we consider it to change states, although it may enter the same state it was in before.

Finite state machines are useful for programming because they provide an alternative model for controlling program execution. Recall that program control is what determines which program statement (or block of statements) is to be executed next. The most common control structure is the default sequential structure; this causes the statements of a program to be executed sequentially, one after another, just as they appear in the program. The other control structures are

1. Alternative selection. This is usually embodied as an if, if...else, or switch statement. This performs one or more tests, and based on the result of the tests, chooses one block of code from a collection of blocks and executes it. The block of code may of course be empty, as it is in the false branch of the if statement.

2. Iteration. This is any loop structure; it causes some loop body to be executed a number of times with exit from the loop based on a test.

3. Subroutine. This causes the current action to be suspended. Control then branches to the subroutine code; upon completion of the subroutine, the original action resumes where it left off. Often recursion (which occurs when a subroutine calls itself) is treated as a separate control structure, but we choose to include it under the subroutine structure.

To this collection we now add the finite state control structure, which chooses a statement to execute (or a block of statements or a subroutine to call) based on the state of a finite state machine and the most recent input.

The remainder of this chapter will define finite state machines and work through several simple examples to develop the concept and to give you practice in working with them. Then, we will give some examples to show how the finite state machine can be used to solve a more complex problem.

The Basic Finite State Machine Model

A finite state machine (also called a finite state automaton, or simply a finite automaton) is a device whose input is a sequence of symbols. The automaton is always in some identifiable state. Each time an input symbol is received, the machine enters a new state, although the new state may be the same as the state it was in before. At each point, the current state of the machine is determined completely by:

1. the state it was in prior to the last input, and

2. the value of the last input symbol.

Formally, a finite state machine consists of five components:

S: a finite set of states. A FSM is always in exactly one of its states.

s0: a particular state called the start state that the machine is in before it has seen any input.

I: a finite set of input symbols

O: a finite set of outputs.

λ: the next state function which maps a current state and current input to a new state. λ(SxI) -> S

δ: the output function which maps the current state and the current input symbol to an output. δ(SxI) -> O

The machine starts out in s0 and looking at the first symbol of a sequence of input symbols. It then issues the output appropriate to this state-symbol combination and goes to the appropriate next state. It then goes on to the next input symbol and the process repeats until the input sequence is exhausted; the machine then stops. Note that the machine is always in some state. It starts out in s0 and is left in some state when operation stops.

Example: Our first example of an FSM is one that takes as input a sequence of binary digits and produces as output the same sequence except alternate ones have been changed to zeros. Thus the input 011010111 becomes 010010010. The machines has two states s0 and s1, with s0 being the start state. Both the input and output sets are {0,1}. The next state and output functions are shown in the tables below.

[pic]

Figure 1

An alternate but equivalent representation for a FSM is shown in Figure 2 below and is called a state diagram. States are shown as circles; the start state is indicated by the bold incoming arrow. The next state function and output functions are shown using directed arrows from one state to another. Each arrow is labeled with one element of I and one element of O. If the machine is in some state s and the current input symbol is x, then we follow the arc labeled x/y from s to a new state and produce output y.

[pic]

Figure 2

The machine shown above produces a stream of output symbols that is exactly as long as the input stream. By allowing “nothing” to be an element of the output symbol set, we can specify machines whose output stream is shorter than the input. For example, the machine in Figure 3 has a two symbol input set {a,b}, and produces outputs of {a, b, “nothing”}. It reads in strings of a’s and b’s and collapses substrings of a’s into a single a; and substrings of b’s into a single b. Hence the input string “aaaabaabbbbbabbab” will produce the output “abababab”, and the input “aaaaaaaaaaaaaab” will produce “ab”.

[pic]

Figure 3

1 Example: the stamp machine

Consider a simple vending machine that dispenses 25-cent stamps, one at a time[1]. The set of inputs are nickels, dimes and quarters. If, for the sake of simplicity, we assume no coin return mechanism, there are only two outputs: one is a single stamp from the roll, the other is nothing; that is, no stamp from the roll. The machine must keep track of how much money has been put in so far, and if the amount is 25 cents or more, dispense one stamp and reduce the customer's "credit" by 25 cents. Note that the machine need not remember exactly how much money has been put in it; it must only remember the outstanding credit. Therefore, since it has no other form of memory, it must have one state for each possible amount of credit. There will be one state indicating that the credit is currently zero; one state indicating a credit of 5 cents, plus states for 10, 15 and 20 cents. There is no need for a state with a credit amount of 25 cents or more, because when the credit amount reaches 25 cents the machine will produce a stamp and decrease the credit amount by 25 cents, all at once. The list of states and possible actions is described in a table in Figure 4. This table merges the output and next state functions; each rectangle contains an output (top line) and a new state (bottom line).

[pic]

Figure 4

In the above table, every possibility has been accounted for. In other words, no matter what state the machine is in and which input it receives, there is exactly one output to produce and one next state. For example, suppose the machine is in the state "10 cents". If the customer puts in a nickel, it adds five cents to the amount put in so far and hence goes to state "15 cents" and produces the output "nothing". If another nickel is added, the machine goes to the “20 cents” state and again produces “nothing”. If the customer then adds a dime, the machine now has enough money to dispense a stamp and have 5 cents credit left over. Hence the machine produces the output “one stamp” and goes to the “5 cents” state. From any state in the stamp machine, adding a quarter will result in the output “one stamp” with no change in the credit balance. And so the new state is in fact the same as the old.

The state diagram for the stamp machine, shown in Figure 5, provides an easy way to visualize what the stamp machine does. It starts out in the initial state (credit = 0 cents). An input can be a nickel, a dime, or a quarter. When the input is received the machine goes from the current state into a new state, following the arrow which is labeled with the type of coin that has been inserted. Some arrows are also labeled to denote that a stamp is produced as output when these paths are taken. If the arrow the machine follows does not say to output a stamp, then the output is "nothing". Notice that new state doesn’t always imply different state. For any credit balance, adding a quarter produces a stamp and leaves the credit balance unchanged. Thus from any state, when the input is a quarter, the machine always enters the same state it was in before and dispenses a stamp.

[pic]

Figure 5

As it is currently specified, the stamp machine will cheat the user out of some money if he or she runs out of coins while the machine is in some state other than 0. We could make the machine more realistic and more humane by adding another input: “I am done” and adding four new outputs corresponding to giving change of 5, 10, 15, and 20 cents. We would then add a new arc from each state to the 0 state with the input symbol being “I am done”, and the output being the appropriate amount of change. For example the arc from the 15 cents state would dispense 15 cents in change. The arc from the 0 state back to itself would produce no output.

Implementing a FSM

A FSM can be implemented with a simple loop. Each time through the loop we get one input symbol, produce the appropriate output symbol, and then go to the next state. When the input stream is exhausted, the loop terminates and the machine stops.

state = start state;

while (there is more input)

{

x = next input symbol;

output(λ(state,x)); // Generate appropriate output.

state = δ(state,x); // Move to next state.

}

The body of the loop contains three operations. First, we must get the next input symbol. This could be done with a read statement or perhaps by getting the next element from an array or linked list. The second and third statements contain function calls to generate the appropriate output and go to the next state, respectively. This could be done by hard wiring the output and transition information into the code of the functions or by using a more general table look-up scheme.

Final output machines

The machines we have seen so far all take a sequence of input symbols and produce a sequence of output symbols. The first machine took in one binary integer and produced another of the same length. The ab reducer machine took an input stream and produced a possibly shorter stream with duplicates eliminated. The stamp machine took in a sequence of coins and produced a sequence of stamps and possibly some change. While a FSM always produces a sequence of outputs in response to a sequence of inputs, we can choose to ignore all the outputs except for the last one produced. That is, the output symbol that was associated with the last symbol of the input sequence.

As an example, let’s build a FSM that will take a string of a’s as input and tell us whether the input contained an odd or even number of a’s. The input set will consists of just the single symbol “a”; the output set will consist of E for even and O for odd. How many states should the machine have? To figure this out, remember that the states of a finite state machine correspond roughly to its memory. It is not necessary to remember the entire number. In fact, there are only two possibilities: that we have seen an odd number of a’s so far or that we have seen an even number of a’s so far. If we have seen an odd number so far and then see another a, then we have now seen an even number of a’s. Conversely, if we have seen an even number so far and see another a, then we have now seen an odd number. Hence we will need two states: s0 for even so far, and s1 for odd so far. The even state, s0, will be the state since initially we have seen zero a’s and zero is an even number.

The next state and output functions will be as follows. If we are in the even state and see an a, then we go to the odd state and output an O since we have now seen an odd number of a’s. If we are in the odd state and see an a, then we go to the even state and output an E since we have now seen an even number of a’s. The last output symbol produced by the machine just before it stops, gives the parity of the input. The state diagram for this machine is shown below. Notice one unusual thing about this machine. Since output is associated with state transitions, there can be no output produced by the empty string even though the string of zero a’s is of even length.

[pic]

Figure 6

Figure 7 shows a FSM that determines whether a binary integer is evenly divisible by 2, by three, by both 2 and 3, or by neither. The input is the sequence of binary digits that constitute the number (reading left to right). The output is 2, 3, or b (for both) and n (for neither). The sequence of output symbols has no real significance, although the individual symbols give the division property of that portion of the input seen so far. The last output symbol gives the property for the entire number. We can think of such machines as implementing a function mapping input strings onto a single element of the output set.

[pic]

Figure 7

How was the state diagram of Figure 7 designed? The divisibility of an integer n by 2 and 3 is determined by the value of n mod 6, or the remainder when n is divided by 6. Thus, if n is divisible by 6, then it is divisible by both 2 and 3, and if the remainder of n divided by 6 is 4, then n is divisible by 2 (because 4 is) and not by 3 (because 4 is not). The state subscript on the states in Figure 7 represents the value of n mod 6. When the binary representation of n is extended by adding a 0, the result is the binary representation of 2n. When the binary representation of n is extended by adding a 1, the result is the binary representation of 2n + 1. With those facts in hand, constructing the state diagram of Figure 7 is straightforward.

Figure 8 shows a FSM whose input is an arbitrary string of letters, digits, and blanks. The final output indicates whether the string is blank (consisting solely of blanks) or numeric (digits and blanks) or alphabetic (letters and blanks) or alphanumeric (letters, digits, and blanks). Rather than labeling the arcs with all possible inputs, we use 'L' for letter, 'D' for digit, and 'B' for blank. The outputs are 'B' for blank, 'N' for numeric, 'Ab' for alphabetic, and 'An' for alphanumeric. Note that once a string is found to contain both a letter and a digit, then it is certainly alphanumeric regardless of what else is in the string. Hence state 4, which is where we go when we find a string to be alphanumeric, is what is called a sink state. It is to finite state machines what a Roach Motel is to a roach: once you get there, you can never leave.

[pic]

Figure 8

Acceptor machines

In some cases, we can eliminate the need for the separate output function altogether and instead incorporate the output into the states. Shown below is a modified version of the “even/odd a” machine from Figure 6 But this machine has the output symbols associated with the states rather than along the transitions. The interpretation is that if the machine is in a particular state, then it produces the output associated with that state. And the final output is the output associated with the state in which the machine stops rather than the output associated with the last transition. The machine below produces the same output as does the machine in Figure 6. As a bonus, this machine correctly produces the output E for the empty string.

[pic]

Figure 9

The second example is derived from the FSM in Figure 7. We can determine the division property of the input simply by observing the state that the machine is in when it stops after having read the entire input sequence. If it ends up in states 2 or 4, then the input is evenly divisible by only 2; if it ends up in state 3, then it is divisible by 3 only. If the machine stops in state 0, then the input is divisible by both 2 and 3. And if it ends up in states 1 or 5, then the input is divisible by neither 2 or 3. Hence we can modify this machine to incorporate the output into the states as is shown below.

[pic]

Figure 10

We can do the same thing with the FSM in Figure 8. State 0 indicates blanks only; state 1 indicates alphabetic; state 2 indicates numeric; and state 3 indicates alphanumeric. The revised state diagrams are shown in Figures 5 and 6 below. Note that we have been able to add a new output, e for empty, associated with s0.

[pic]

Figure 11

A special case of such a machine is one in which the output set contains only two elements. All input sequences are thus mapped onto one or the other output symbol. We can think of the output symbols as the binary digits 0 and 1 and associate the notion of “accept” with 1 and “reject” with 0. Then we can think of the machine as either accepting or rejecting an input depending on whether that input causes the machine to stop in an accepting or rejecting state. Such machines are called acceptor automata.

The first three examples below are acceptor automata made from examples we have seen already. These figures also show one further shorthand notation. Instead of writing the output values 0 or 1 in the each state, we use a double circle to indicate accepting states (output of 1), and a single circle to indicate rejecting states (output of 0).

The machine in Figure 12 accepts strings of a’s that are of even length and rejects strings of odd length. Notice that this machine correctly accepts the empty string. The machine in Figure 13 accepts binary integers that are evenly divisible by 2 or by 3 or by both. The machine in Figure 14 accepts strings that are either alphabetic or numeric and rejects strings that are blank or alphanumeric.

[pic]

Figure 12

[pic]

Figure 13

[pic]

Figure 14

The machine in Figure 15 might be used in the lexical analysis phase of a compiler. It takes strings of characters and accepts those that are valid identifiers (for example, names of variables or procedures). To be accepted, a string must begin with a letter (indicated by the generic 'L') followed by letters and digits ('D'). We denote by 'S' any character such as ‘$’ or ‘?’ that is neither a letter nor digit. Note that encountering any 'S' character takes us immediately to the sink state S3 which is a rejecting state and from which there is no exit. Note also that this FSM imposes no limit on the length of the input. Pascal, for example, allows names of up to 255 characters. However, there is no easy way to impose such a limit on a FSM. This is a limitation that is inherent to the FSM and one we will consider in more detail below.

[pic]

Figure 15

The FSM in Figure 16 reads strings of characters and accepts only those strings that contain a single unsigned integer, possibly preceded and followed by blanks. For the sake of simplicity, it is customary not to show the sink rejecting state nor the arrows leading to it. Instead, you can assume that if the machine is in some state s and looking at input symbol x, and if there is no arrow labeled x leading from s, then you go to the sink rejecting state and stay there. Thus, for example, if the machine is in state 0 and see the letter ‘a’, then you next go a rejecting sink state.

The five states in this machine can be thought of as being associated with the five different classes of strings. In state 0 we have seen nothing yet. If the machine stops in s0 then we know the input was empty. State 1 indicates that we have seen only blanks so far; ending there indicates a string made up of blanks only. State 2 is associated with strings that have zero or more blanks followed by a contiguous substring of digits. In state 3 we know that we have seen zero or more leading blanks followed by a contiguous substring of digits followed by one or more blanks. And state 4 (not shown) is the sink state where we go if we encounter a character other than a digit or blank or if we see more than one contiguous string of digits. States 2 and 3 are accepting; the others are rejecting.

[pic]

Figure 16

We can take our integer acceptor one step further by accepting a string of characters that contains one real number optionally preceded and followed by blanks. The real number can be represented either in standard notation or in scientific notation. The construction of this machine is left as an exercise. Try the machine on the following real numbers as well as on some strings that do not contain valid reals.

100

-100

3.1415

6.02E24

-8.8E-11

String Searching

Another practical use of acceptors is in string searching. String searching problems are very common problems, most notably in text editing. The usual statement of the problem is: "determine whether string X occurs in string Y." (For this problem string X will be referred to as the pattern and string Y as the target.) The naive method of doing this would be to write a loop that goes through the target string a character at a time and checks to see if the pattern occurs beginning at that character. After the complete pattern has been compared to the target, go to the next character in the target string and start again. This simple algorithm, (which could be simplified somewhat by using the substring facility of Java), is implemented by the following code, which we will call algorithm A:

1 Naive String Search (Algorithm A)

// Find the first occurrence of pattern in target,

// beginning at position start.

// Determine whether one string occurs in another string

// starting at a specified point.

// Does s1 match a substring of s2 starting in position pos?

public boolean match(String s1, String s2, int pos)

{

// pre: true

// post: Returned value is true iff s1 matches a substring

// of s2 starting in position pos.

// Trivial case: not enough room in s2 for a match.

if (s1.length()>s2.length()-pos) return false;

int i;

for (i=0; i ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download