abbabcabbabc, these strings are the same.

Obviously, you can see that the cross-parsing distance between P and Q, in this case,

is just one because I start at this position,

I cross-parse P relative to Q,

I find P and Q exactly it is.

So, cPQ is equal to one.

When I cross-parse Q relative to P,

again I don't need to do any restart,

I start at this position and I find my string Q in P exactly.

So, cQP is also equal to one.

The distance between them is defined by minus one for both one of them,

zero, and taking their average;

zero by two which is zero.

So in this case, when the two strings are identical to each other,

this algorithm says that the cross-parsing distance

between the two strings is zero, as you would expect.

What if the two strings are very different from each other?

Say basically one of the strings is abc,

and the other thing is xyz, what happens then?

Well, let's basically see that.

So, this is our string P,

this is our string Q. I start at the beginning

when I'm completing cP relative to Q. I start at the beginning.

I don't find it, empty set.

So I go to next position.

I don't find it, empty.

Next position, I don't find it, empty.

Then I finish, for I had to start at least three times.

What about when I compared Q to xyz?

In this case, it's the same thing.

So, I basically start at this position, I don't find it.

Then I start at the next position, again I don't find it.

Then I start at the next position,

I don't find it, and I finish.

So, see QP is also three,

and the average here will be that I will compute,

both of them will be two and the average will be two.

That is, the cPQ will be,

if the length is N and this length is N,

if they are different from each other,

completely different from each other,

so the cross-parsing distance will be N minus one,

because in both cases,

I have to start for every single position of the string.

So, when the strings are similar,

too identical because cross-parsing distance becomes zero,

when the strings are completely different,

the cross-parsing distance becomes N minus one.

So this is actually the idea.

Cross-parsing distance doesn't tell whether the number of edits are zero,

whether the number of edits are N minus one, it doesn't say that,

but it essentially gives us some sense about whether P is similar to Q or not.

So it can be used as an approximation of the distance measured.

What about the cost of this?

Can this be computed efficiently?

Right? So we're doing it, it turns out,

and we will not learn about this algorithm here,

but it turns out that since it relies on the idea of prefix search,

and as we have learned in the past,

the prefix search can be implemented efficiently using

the using the suffix tree data structure.

There is actually a very efficient algorithm to

compute cross-parsing distance using the suffix tree algorithm.

So instead of going and paying the quadratic cost of edit distance,

we can actually do the linear time cross-parsing of the two strings,

which is very efficient.

So this is the first approximate algorithm that we

learn to approximate the edit distance cost.