0:03

A concept I want to begin with in introducing

Theoretical Computer Science is the idea of limitations on the power of computing.

So let's see where we are.

So for programmers, regular expressions are really a powerful pattern matching tool.

We have a paradigm based on a theory that really facilitates

the implementation and the combination of the two are very important in the real world.

For theoreticians, there's the idea that we have two different ways

to describe sets of strings and we know they're equivalent,

but it's interesting to think about other sets of strings,

other languages and machines that can describe sets

of strings that we can't do with REs and DFAs.

I alluded to the fact that there are,

but what are they?

How do we address the idea of characterizing those types of languages and machines?

Certainly, the DFA is

an extremely simple machine and we imagine there must be machines that

are more powerful in terms of having

the ability to describe a broader class of sets of strings,

but what are those machines like?

For you, as we mentioned,

you've got the core principles of computer science introduced,

the basic idea of abstract machines and languages,

and we're going to use those things to get started

on really addressing fundamental questions in theoretical computer science.

So here's the basic idea.

So, are there sets of strings that can't be described by any regular expression?

And the answer is yes, there are.

And we're going to do a mathematical proof that shows that you

can't have an RE that describes the bitstrings with equal number of zeros and ones,

and that's just one example.

There's many other examples of strings that represent legal regular expressions.

There is no regular expression for describing those.

Or prime numbers or another example from biology,

complimented palindrome or palindromes.

There's a long list of sets of strings that we'd like to

work with that regular expressions are not powerful enough to describe.

And then you might say the same thing about DFAs.

Are there sets of strings that can't be described by any DFAs?

And of course, by Kleene's theorem,

that's the same question.

You can't build a deterministic finite automaton

to recognize say bitstrings of equal number of zeros and ones.

And if you prove that, then you've proved that you can't

have a regular expression that specifies that language.

So that's a basic question,

and let's first be sure that we have a good answer to that first question.

Why is it that we say that

deterministic finite automaton are not sufficiently powerful to do what we want?

We have a limit on the power of these abstractions.

So, a proposition is there exists a set of strings that can't be describe any RE or DFA.

We're going to actually do a proof.

We're trying not to do too many proofs in

this course but a couple of them are really important,

and this one, you'll be able to understand.

So what we're going to do is prove this by showing that for this specific language,

this set of bitstrings with equal number of zeros and ones,

no DFA can recognize that set.

So our proof is going to be by contradiction.

So let's assume that we do have such a DFA.

It's a finite automaton,

so that means the number of states is finite so we can just say there's a number N,

which is a number of states. So that's our first thing.

Let's assume that we can do that.

So it's supposed to recognize strings that equal number zeros and ones,

so it's going to recognize the string that has N + one zeros followed by N + one ones.

So for 10, so we give it 11 zeros and

11 ones and it's going to go ahead and recognize that.

So we don't know what sequence of states that it goes

through but it's going to go through some sequence of its states.

It doesn't matter what it is as it reads each zero,

whatever complex construction it is,

it's going to go to some state.

And each time, it's completely defined what state it goes to next.

Whatever state it's in,

follow the zero transition out.

Now, the critical thing is that since there's only N states and we have N + one zeros,

some states going to get revisited.

It's not enough different possibilities

to get through that without having some state getting revisited.

But that's crucial because what happens next only depends on being in that state.

So what that means is if you take the zeros between the visits of that state,

the machine can't tell the difference.

Whatever it does on that string with equal numbers of zeros and ones,

it's going to do on this other string.

It's going to start out zero, three, five,

and once we're in position five,

it's going to do precisely the same thing for those two strings.

So there's a string with equal numbers of zeros and ones that it may recognize but

there's another one that has fewer zeros than ones that it's going to recognize too.

That string does not have equal numbers zeros

and ones so that's a proof by contradiction.

It's recognizing a string that we said it didn't recognize,

so the assumption that we started with that such a DFA exists must be false.

That's a proof that there is a set of strings and pretty simple one that

can't be described by any DFA or regular expression.

So that's our first result in theoretical computer science.

There's a limit on what we can do with this kind of machine.

So it's a natural question to ask is,

so is there a more powerful machine?

Like for example, is there one that can

recognize this language equal numbers of zeros and ones?

And the answer is yes.

What we can do is add a stack to the deterministic finite automaton.

I won't give the details,

but I'll give a demo and you can understand how this machine works.

So with this machine,

you get to if you want in a state put the character that you just read on a stack.

So let's look at how that one works in this example.

So we read as zero,

we put it on the stack,

read another zero, put it on the stack.

When we read a one and we have zero on the stack,

we pop, but if the stack's empty,

we put the ones on.

And now, you can see that whichever character has appeared more is going to be on

the stack and it essentially counts the imbalance between the zeros and the ones.

If the stack is empty and we get to the end then it lights the yes.

So that's a more powerful machine that can recognize this other language.

It's interesting that just by adding a stack to a DFA,

we can make it recognize more machine, more larger language.

There's also a form of language more complicated than

regular expressions that defines the same set of strings.

There's a duality between machines and

languages but we're going to focus on machines just for now.

So what about more powerful than a one stack DFA?

Yeah, actually, there's languages out there that if we add a second stack,

we can recognize prime numbers,

we can recognize strings that are legal JAVA programs,

many, many other things.

And again, I won't go to the details of that just now,

that's really about the next lecture.

This is the way a theoretician thinks.

I've got the simplest machine I could possibly imagine and if I had a stack,

I can do more than with the simplest one.

If I had a second stack,

I could do even more.

And the question is where does it stop?

So we'll talk about this one next lecture, but the thing is,

is there a machine that's got more power than a two-stack DFA?

Can we add a third stack and then recognize even more languages?

And the amazing answer is that no.

Not even a roomful of supercomputers is more powerful in this sense than a two-stack DFA.

That simple abstract machine,

from this standpoint, is equivalent to a roomful of supercomputers.

It's a basic limitation on

computation that we're going to talk about in the next lecture.