0:00

So we've talked about this problem of global alignment where we

have two sequences and we align them in their entirety.

So I have sequence X and sequence Y, and I want to align them so

that all the letters in X match some letters and dashes in Y and vice versa.

But sometimes in biology something else might happen that is interesting,

which is the following.

You might be presented with two sequences like sequence X is of the form AGA and

sequence Y is, let's say, TTT AGGA TTTTT.

So if you look at these two sequences, their lengths are very different.

0:41

There if you look at them you know, even not that carefully, you'll

see that there is this region here that seems to be similar to this region here.

Okay?

But what happens if I do global alignment on these two sequences?

For example, here's one that if I were to guess here's a good alignment on

these two.

[SOUND] This would be, for example,

a good alignment here that would capture

this similarity within this region.

Okay?

Now, what if we had a scoring matrix, for

example, M, that on the diagonal here,

you know A, C, G, T, A, C, G, T.

You know, for identity we have scores of 10 and for

dashes we have minus 10 minus 10 and so on.

Okay?

And suppose for this, for this similarity we have it say 4 again.

1:59

What would be the score of this?

And this is the, this is going to be

the issue with aligning these two sequences globally.

This is minus 10 for this one, minus 10 for this one, minus 10,

then we have 10, 10, minus 10, minus 10, minus 10 for each one of these.

2:20

Okay?

So we have one, two, three, four, five, six, seven, eight, nine, ten.

This is minus 100, plus 20, score of minus 80.

Okay?

2:31

This is a very negative score for an alignment.

Okay?

Now, there, these two sequences.

It makes sense for the score to be very negative.

This has only three letters.

This has many more letters.

And this is basically reflecting what we are seeing in

these two sequences given this matrix.

But this kind of alignment, globally aligning them,

forcing every letter in Y to go with something in X and every letter in X to go

with something in Y, basically masked this region of similarity here,

that if I ignored, all these parts here would have gotten a very good score.

What do I mean?

Suppose that I define a, a version of the problem that will ignore these.

3:16

Okay?

So these are regions that are basically in A, I have nothing to match in Y.

And this region here.

Okay?

Suppose I ignore this, what happens now?

We have 10, 10, 10 minus 10.

Now, the score becomes 20, right?

3:34

And now the score is much higher than this, and

it does reflect that there is a very good similarity within this local region.

Okay?

When this happens, which happens a lot in biology, that we have two sequences,

that we are looking only for local regions within them that align well.

Okay? This is where we want to

solve a different version of the problem that's called the local alignment problem.

4:05

Okay?

So now the local alignment problem is not, is different from the global alignment

problem, in the sense that I am looking for an alignment of substrings of X and Y.

In this case the best alignment is obtained by aligning X itself

4:33

So if I took this substring of X, which is X, all of X, and

if I took this substring from Y, which is very small part of Y and

I did global alignment on them, what I would get is, for

example, AG-A, AGGA, okay?

In this case, again I would get that score of 20 and I have very good alignment.

Okay?

So local alignment is basically try every substring of,

of X, try every substring of Y, do global alignment on them and

then out of all these possibilities,

out of all these combinations of substring from X, sub string from Y and

global alignment, find me the one that has the highest score, okay?

This is the local alignment problem.

It basically says you know try every substituting of X,

try substituting of Y and globally align them using this scoring matrix.

Okay?

In this case, for example, it might be that this is the best alignment.

Indeed if you try to take any other letter from Y, which would still be

a substring of Y and you try to add it, it is going to add dash with T for example,

and it is going to keep lowering the score because we have a negative score here.

So it makes no sense to add them here or it wouldn't be optimal, okay?

So again, to redefine or to restate that, the local alignment problem of X and

Y is looking at every possible substring of X,

every possible substring of Y, align them globally.

And now out of every possible combination of substrings from these two strings,

finding the one that gives us the highest possible score.

6:15

And this, in this case, will solve us, will give us this solution.

As I mentioned for global alignment, we can,

we have a very efficient dynamic programming algorithm for solving it.

And it turns out a very small modification to that

algorithm will solve the local alignment problem for us, okay?

One last word I want to say about both global alignment and local alignment.

6:36

Once we define the scoring metrics, and once we define the two strings that we

are interested in our lining whether globally or

locally, the score of an optimal alignment is unique.

Okay? In this case it's 20.

There is no optimal alignment that has score of 22.

Because if there's an alignment of score 22, it means this is not optimal.

Okay? The score is unique of the optimal one.

However the alignment that has the highest score is not unique.

For example, in this case, I could have this case,

I could have aligned the sequences as A-GA.

AGGA.

This alignment differs from this alignment, right?

This, this alignment is saying that there were something interesting happening at

the second position.

A deletion here or an insertion here.

This is saying what happened was, in the position, the third position.

However, the score of this is also 20.

Okay?

So this score of the optimal alignment, 20, is unique.

The same thing for global alignment.

However, the alignment itself might be, more than one, okay?

The optimal alignment might be not unique.

If that's the case, if we have more than one possible optimal alignment in general

we define the problem as returning and arbitrary one, okay?

If there is more than one optimal global alignment we are interested in

an arbitrarily chosen one.

If there is more than one optimal local alignment we are interested in

arbitrary one, okay?