0:00

In this session we come back to the problem of proving properties about

programs. Where before, we covered just lists, we

will now look at more job data structures, Maybe trees.

Remember in the intro to this course, I told you that functional programming is

important because it's very close to the mathematical theories of data structures.

We put that to a test. Now we will develop such a theory for

integer sets and prove an implementation correct with respect to the theory.

As was the case in the previous proving sessions, the material in this session is

optional for the online class. If you're a student of the Life EPFL

class, you should follow the material, because it might be relevant for the exam.

But we can generalize the structural induction principal for lists to arbitrate

these structures. The principal then becomes the following.

We want to prove a property P of T for all trees of a certain type What we need to do

to do that, is we need to show that P of L holds for all leaves types of the tree.

And for each type of internal node T which has let's say sub-trees as one, two SN, we

need to show that under the assumption that PS1 and PSN all the sub, all the

sub-trees satisfy the predicate than P of T holds.

So let's use this proof technique to show some interesting facts about IntSet.

We call our definition of IntSets from previous sessions.

We had an abstract class IntSet with operation include and contains, and then

we had two different implementations. Once was, one was an object empty and the

other was the class nonempty, and there was an invariant.

We assume then that was that the elements in a tree were ordered.

That means that the left subtree of any nonempty tree contained elements that were

smaller, or smaller than the current element.

And the right subtree contained elements that were larger.

And our implementations of contains and include made use of that invariant.

So we would like to prove that implementation of IntSet correct.

But what does it even mean? What do we mean by, by proving the

correctness of an IntSet implementation? Well, one way to define correctness would

be to define some laws that our implementation should just, should

satisfy, and then prove that the implementation indeed does that.

So in the case of IntSet, what laws could we come up with?

The first law says the empty set does not contain any, any element so empty contains

x is always false. The second law says that if we add an

element x to a set, an arbitrary set and then ask whether the set contains x, then

we are certain that we will get back true. And the third law says that if we add an

element x to a set and then ask whether the set contains some other element y.

Then the answer is the same thing as simply S contains Y.

So, it didn't matter the fact whether we added X or not.

The answer will be invariant under that. In fact, one can show that these three

laws completely characterize what it is, to be an n set.

So, the, we have now an algebraic specification of IntSet which is complete.

But it still remains how to prove these laws.

So, let's start with the first one. Empty contains x equal false.

Well that one is actually easy because that's a direct consequence of the

definition of contains and empty. Have a quick look at it, so here you see

empty contains any element would give us false.

The second proposition says that if we include X and S and then ask, whether the

set contains X, we would get true. And that we can prove by a structure

induction on the set S. The base case would be the set S is empty,

so we are left with the expression, empty include X contains X.

Now, empty include X, we know what that is, by the rule of empty.include.

That would give us a nonempty set X with two empty subsets, and we ask whether that

one contains X. And, the answer here is true because of

the clause of, contains in a nonempty set where we know that if we ask for the

element at the top of the tree then the answer is True.

We can compare to the implementation of nonempty to verify that.

So that was the base case. What about the induction step?

So the induction step would be that we have a tree, call it nonempty.

4:52

With free element zet. L and R and we have to proof the

proposition that include x and contain x is true for each of these trees.

We actually have two cases here. We could have the case that the Z is the

same as the X, or that the Z, the element of the nonempty, is different from the X.

Let's take these two cases turn by turn. From the first case, I assume that the Z

equals six, so I'm left with tree nonempty XLR, and have to show that include X

contains X equals true. So what can we do in this case?

Well, we can look at what's the definition of include.

If we look that up then we find that including an element to a tree that

already has that element at the root is the original tree.

So this expression here. Would simplify to that one here.

And then looking up the contain operation, we, we find that asking contain.

On a tree that contains that element at the root would give you back true.

So the whole expression simplifies to true.

So that handles the case where we, we're left with a non-empty tree, and the root

element X of the tree was the same element as the one we included, and contains

check. What if the root element is different?

There again we have two choices. Either the root element is smaller than

our element x or it's larger. So let's look at the case where it is

smaller. So we would, we have a, a tree a non empty

y, l, r include x, we ask whether it contains x and we would like it to return

true. So, the.

By the definition of nonempty include, we, we can rewrite this term to this one here.

Why? Well, because we know that X is greater

than the root element Y, so we would have a recursive include at the, on the right

hand side of the tree. Okay, let's look at contents now.

Again by the same reasoning, we would have that contents test of a tree like that,

would translate into a contents test of its right sub-tree.

So that would be root include X contains X.

And now we can apply the induction hypothesis, which says for all sub-trees I

assume that the property is proven, so I am left with true.

7:24

There's a third induction step to do where now the root element of the tree y is

greater than x, but this one is completely analogous to the previous one so I'm going

to omit it. Now, let's prove the third proposition.

That proposition reads that. Xs include Y contains X is the same as XS

contains X provided X and Y are different. So, if X and Y are different, it makes no

difference whether I add Y to the set and ask whether and, it contains a given

element, X, or whether I ask the set directly.

And to proof again would be by structure induction.

So assume first that the element that we add is smaller than the element we test

for. The dual case where the element we add is

larger is completely analogous so we don't need to do both cases.

The base case, then, would be that the set is empty, so we include an element y into

an empty set, and then we ask whether it contains x.

And to show is that that's actually the same as asking the empty set whether it

contains x directly. So, empty include Y, gives us nonempty Y,

empty, empty. Asking whether that contains X, gives us

empty contains X. So, more precisely we go in the right

subtree, because that's where the X is bigger than Y, so that's the empty here.

8:58

And that concludes the proposition. That's what we needed to show.

Now we have to do the inductive step. So the inductive step is a tree, non

empty. With some root node Z, and a subtree L,

and a subtree R. And unfortunately, the five different

cases to consider. So the first case is that the root of the

tree is, is, is the same as the node X. Second one is, it's the same as Y.

The third one is, it's smaller than both Y and X.

The fourth one is, it's between Y and X. And the fifth one is, it's larger than

both Y and X. So let's look at some of these cases in

turn. The first two cases are easy.

Lets first assume that the root of our tree is x.

So we have this expression here. Non empty XLR include y contains x.

So if we include y in a tree like that, then what happens is that we actually go

to the left sub tree and include y here. Because by assumption y is smaller than x.

So we ask whether that tree contains x, and here the answer is obviously yes

because the tree contains already x at the root.

So by the definition of non-empty contains we get back true.

What we wanted. The second case would be that the root of

the tree is the same as y, and if we look at the right-hand side non-empty xlr

contains x. Then by the same reasoning that one is

also true. So the equation is established.

The second easy case is where the root of the tree is the same as Y.

So now we include Y in a tree that already has root Y and that of course is the same

as the original tree. That doesn't change anything and that

again is what we wanted. So now we come to the more difficult

cases. The first case is that we are left with

the three non-empty ZLR, where set is smaller than Y and X.

And in that case we'd need to show again that, that expression here is the same as

just non-empty ZLR contains X. So what can we do here?

Well, again we apply the law of non-empty include to conclude that yes, we have to

include the element Y to the right sub-tree, because Y is greater than Z.

11:22

Then we, apply the definition of contains to conclude in turn that, yes, we have to

look at the right sub tree. Because X is also greater than that.

And then we can apply the induction hypothesis to say, R include Y contains X

is the same as R contains X. Because we assume the theorem to be

already proven for R. And that, in fact, is the same as

non-empty ZLR contains x, because if we simplify that expression, we see that

because x is greater than z, we look again at the right sub-tree r, so again we have

established the equality. The next case is where z is now between y

and x so we have the same situation as before but the value of z now is between y

and x. So what we do in this case here is that

including y into the tree here, we go to the left of tree because Y is smaller than

that. Asking the contents.

We go to the right of tree because x, x is larger than set so we look actually we

include and we test in different subtrees. So we're left with r contains x.

And that actually is already the same as non empty set ZLR contains x by the

definition of non empty contains work backwards.

Because again, for this tree here, we look, again in the right subtree.

So we've see that, in this case here, we've established the equa-, equality

without resorting to the induction hypotheses.

Because the inclusion and the test fell into different subtrees.

So the third case is where that is larger than both y and x.

And that's actually a complete dual of the third case where that was smaller than

both y and x. So I have written down the proof here, but

I will not go into the details one by one. These are all the cases, so the

proposition is established, so this proof was quite involved, but on the other hand

we were also showing something quite significant.

Namely the correctness of a non trivial implementation of sets of binary trees.

I would argue that the complexity of the purely functional equational proofs often

compare favorably, with what you would have to do in an imperative language.

If you haven't had enough of proving yet. Here's an exercise for you which is, in

fact, quite hard. I come back to the question of adding

union to IntSet So here's a way to do it, which is actually a bit more efficient

than the first solution that I've shown you in the worksheet.

So we would have, the union operation of the empty set is, of course, the other set

that we add to union. And then union of a nonempty set would be

defined like this. We take the left.

Sub-tree, we union it with the, the right sub-tree, unioned with the other set, and

finally include x in, into the resulting tree at the end.

So what I would like you to do is, to prove the correctness of union, which is

translated into the following law. What you would like to have is that if we

take the union of two sets, and we then ask whether it contains an arbitrary

element x, that this is equivalent to asking whether either x has contain x or y

is contain x, so both sides should be true and false for the same sets and for the

same elements. The task then is to show this proposition

by using structure induction on XS.