0:03

Next we're going to look at

simple abstract machine called deterministicfFinite automaton.

And remarkably, this simple machine is going to help us with the problem

of determining whether or not a given string

is in the language described by a given regular expression.

So let's describe and define a deterministic finite automaton.

It's an abstract machine that actually is

built for the purpose of solving a pattern matching problem.

That is, there's a string specified on an input tape.

No limit on the length of that input tape,

string can be any length whatever.

And what the DFA can do,

is to read a character on the input tape exactly once,

moving from left to right.

It gets to read the character and then move on to the next character.

After it's read the whole string,

it's going to light up the 'Yes' light if it recognizes a string,

the 'No' light otherwise.

So, it has internally a program.

The idea of what set of strings it's going to recognize,

and it reads with the whole string and then

lights up the light depending on whether or not it recognizes it.

So with that, each DFA defines the language.

It reads a string,

it can read any string at all.

And for some of them, it lights up the 'Yes' light,

and those are the ones that are in the language it defines.

And for other ones,

it lights up the No light,

those are the ones that are not in the language that it defines.

So, complete correspondence between DFAs and languages are sets of strings.

So let's look at how to make a DFA,

and look in more detail.

So here's an example of a DFA,

where we talk about actually what's inside.

So the first thing is,

it's got a finite number of states.

So those are the circles in the diagram inside.

And each one of the states is labelled a Y for yes, or N for no.

Also, there are arrows or transitions between the states,

and each one of those transitions is labelled with a symbol.

The other thing is that one of the states is the start state,

and that's the one in this case,

at the left with an arrow from nowhere pointing to it.

So the way the rules that govern the operation of the DFA are listed as the bullet point.

We begin in the start state,

read one of the next input symbol,

and move to the state that it indicates from our start state.

So every state has a transition leaving it labeled with each character.

So whichever character it happens to be,

that says which state to go to next.

And we just keep going until we've read all the input,

and then we're going to be in one of the states after that process.

And if we're in a 'Yes',

we light the ''Yes' light,

and if we're in an N, we light the 'No' light.

So let's look at an example for how this DFA operates for this string.

Is this string in the language defined by this DFA or not?

So, we're in the start state, and we'll read to B.

So it says that what we should do is,

read that B and go to the middle state.

Now we're in that state, and we read a B,

and again, we go to the right.

Now in that state, we see an A.

And the state at the right, we see an A,

and the transition says just read that A and stay in that state.

And again, read that A and stay in that state.

Now we have a B, so we go back over to the first state on the left.

Read a B, go to the middle state.

Read an A, stay there.

Read a B, go to the state on the right.

And read a B, and go back to the start state.

Now we've read all the characters,

and we're in a state labelled Y,

so we light the 'Yes' light.

So we say that, that DFA recognizes that string,

or the string is in the language defined by the DFA,

the set of strings defined by the DFA.

Here's another example, another string, same DFA.

Read the B, go to the right.

Read an A, stay there.

Read another A, stay there.

Read a B, go back to the left.

Read a B, go to the middle.

Read an A, stay there. Read a B.

Now we've read all the characters and we're in a state labeled N,

so we light the 'No' light.

That string is not in the language defined by this DFA.

Very simple set of rules.

And every DFA as long as it has

states with transition for every character that it might read for every state,

can follow these rules and eventually either light up 'Yes' or 'No'.

DFA operates according to very simple rules.

One thing that we can do to better understand it,

is to write a Java Program that simulates the operation of any DFA.

This is what we do to get our examples and quiz questions and so forth.

So, what's that going to look like?

Well, there has to be the start state.

And then we have a boolean array that says whether it's a Yes or a No,

or does it recognize the string or not.

And then we have an array of symbol tables for each state.

We have to know for each character,

what state we go to next.

For example, for state one,

the one in the middle, if we get an A,

we're going to go to state one.

If we get a B, we're going to go to state two.

So we have a little simple table mapping A to one and B to two.

So, those represent those tables and represent

the data structures that we need in order to represent the state in the DFA.

So our constructor takes a file,

and we have a file format that basically has the same information,

and the constructor will just read that in.

So, maybe this is our file format which is the number of states, the alphabet,

the letters that we have to worry about seeing on the input,

the start state, and then the values needed for these tables.

And so, any of us can write a program

to read that file and fill in those data structures.

So, what's interesting is the routine recognizes,

which takes the string on standard input and simulates

the operation of the DFA returning true if it recognizes the string,

and false if it doesn't.

So we start at the start state, and then sorry,

we take the string as an argument to this method.

And then we want to print out whether or not it's in the language recognized by the DFA.

So, for every character in the string,

all we do is get that character.

So, it's input that charAt(i).

And then we get the state that in the symbol table.

So next of state is the symbol table,.get gets the entry corresponding to that character,

and that's a state and we just reset the state to that.

And then when we're done,

we've read all the characters,

then we just return true or false depending on which state we're in.

And so, our test client can make a new DFA from a filename given as an argument.

And then it can read through standard input.

It can read a string and then print out

"Yes" if the DFA recognize it and "No" if it doesn't.

So, for example, we can run it and say,

well, it's "ba, ba,

ba" in there and it says, "Yes."

What about "bb?" It says "No."

And when we type in any string we want,

and have a simulation of the result of running that DFA to recognize that string.

Pretty interesting idea that we can,

with really a one-liner,

simulate the operation of this abstract machine.

It's an abstraction, but we can write a program to try to understand what it does.

And that turns out to be a key concept in

theoretical computer science that we'll come back to again and again.

So, here's just, again,

like the pop quizzes on irregular expressions,

you have the same kind of thing on DFA.

And these get to be pretty complicated.

We actually really need these programs to check these kinds of questions.

They make good quiz questions and you'll certainly encounter them if

you have any kind of quiz in this kind of material.

So, which of the following strings does this DFA accept?

Does it accept a set of bit strings that end in one?

Well, no, because it accepts 0 1 1 0.

That one doesn't end in one.

There's lots of strings that don't end in one that accepts.

But to answer the question,

we just have to find one.

That's kind of easy to say no than yes for this kind of question.

Equal number occurrences of 0 1 and 1 0.

Well, no, there's plenty.

Again, there's plenty of examples.

Bit strings with more ones than zeros.

No, you could do lots of zeros in

the first state and then the one with lots more zeros than one.

Equal number of occurrences of zero and one.

No, this one that we just gave disproves that.

Bit strings with at least 1 1. Yep, that's the one.

And there, you have to reason that it

accepts all those bit strings and only those bit strings.

That's why these types of questions are challenging to answer often.

Here's another example.

So, what following sets of strings does this one accept?

Well, does it accept the set of bit strings with at least 1 1?

Well, no, in this case,

here's a bit string that's got 1 1 but it doesn't accept it.

The first transition stays in the node state.

Then the one takes it over the yes, but then two zeros,

it winds up in the no state so in that language,

but this one doesn't accept it.

Equal number of occurrences of 0 1 and 1 0.

Again, same argument, this string will wind up in the no state,

so it doesn't accept it.

More ones than zeroes, well, no,

that's not an either or you could reason yourself why that one is not it.

Equal number of occurrences of zero and one.

No, that's usually a tough one when you've got

a loop because you can just spin in the loop as long as you want.

Actually, this one's a kind of complicated way to recognize a pretty simple language,

just the bit strings that end in one.

The only thing you can say about a bit string in

this language recognized by the DFA is that it ends in one.

Again, studying these types of questions and looking for others in the book

or in the book site is a worthwhile activity to better understand DFAs.

Now, why don't we talk about DFA in the context of regular expressions?

Well, there's a very important reason a theorem going back 50 years or more,

now more than 50 years, called Kleene's theorem,

and it's an observation based on the idea that we've just seen,

two different ways to define languages.

We've seen regular expressions.

You give a regular expression that defines a set of strings.

And we've seen DFAs that also define a set of strings.

So, for example, the set of all strings made up

the characters A and B with the number of occurrences of B as a multiple of three,

that's the DFA, the one that we've looked at,

and that's the regular expression.

Two different ways to define sets of strings.

Remarkable fact is that these two ways of specifying sets of strings are equivalent.

So that is, given any regular expression,

there's a DFA that accepts the same set of strings.

And given any DFA, there's an RE,

regular expression, that matches that same set of strings.

So, they're equivalently powerful.

Now, there are sets of strings that you can't specify with either one.

And we'll talk about that later on.

But the idea that these two models of

specifying languages are equivalent is really a profound idea.

And for now, the consequence of this is there's a way

to solve the regular expression pattern matching problem.

What we can do is, in Kleene's theorem,

gave a constructive way to build the deterministic finite time,

time corresponding to the given RE.

That's how we prove there exists an automaton as a way to build it.

And then use our program to simulate the operation of the DFA.

So, that's a fine way to solve the RE power matching problem from Kleene's theorem,

and that's what we're going to examine next.