0:00

So we've talked about this problem of global alignment where we

Â have two sequences and we align them in their entirety.

Â So I have sequence X and sequence Y, and I want to align them so

Â that all the letters in X match some letters and dashes in Y and vice versa.

Â But sometimes in biology something else might happen that is interesting,

Â which is the following.

Â You might be presented with two sequences like sequence X is of the form AGA and

Â sequence Y is, let's say, TTT AGGA TTTTT.

Â So if you look at these two sequences, their lengths are very different.

Â 0:41

There if you look at them you know, even not that carefully, you'll

Â see that there is this region here that seems to be similar to this region here.

Â Okay?

Â But what happens if I do global alignment on these two sequences?

Â For example, here's one that if I were to guess here's a good alignment on

Â these two.

Â [SOUND] This would be, for example,

Â a good alignment here that would capture

Â this similarity within this region.

Â Okay?

Â Now, what if we had a scoring matrix, for

Â example, M, that on the diagonal here,

Â you know A, C, G, T, A, C, G, T.

Â You know, for identity we have scores of 10 and for

Â dashes we have minus 10 minus 10 and so on.

Â Okay?

Â And suppose for this, for this similarity we have it say 4 again.

Â 1:59

What would be the score of this?

Â And this is the, this is going to be

Â the issue with aligning these two sequences globally.

Â This is minus 10 for this one, minus 10 for this one, minus 10,

Â then we have 10, 10, minus 10, minus 10, minus 10 for each one of these.

Â 2:20

Okay?

Â So we have one, two, three, four, five, six, seven, eight, nine, ten.

Â This is minus 100, plus 20, score of minus 80.

Â Okay?

Â 2:31

This is a very negative score for an alignment.

Â Okay?

Â Now, there, these two sequences.

Â It makes sense for the score to be very negative.

Â This has only three letters.

Â This has many more letters.

Â And this is basically reflecting what we are seeing in

Â these two sequences given this matrix.

Â But this kind of alignment, globally aligning them,

Â forcing every letter in Y to go with something in X and every letter in X to go

Â with something in Y, basically masked this region of similarity here,

Â that if I ignored, all these parts here would have gotten a very good score.

Â What do I mean?

Â Suppose that I define a, a version of the problem that will ignore these.

Â 3:16

Okay?

Â So these are regions that are basically in A, I have nothing to match in Y.

Â And this region here.

Â Okay?

Â Suppose I ignore this, what happens now?

Â We have 10, 10, 10 minus 10.

Â Now, the score becomes 20, right?

Â 3:34

And now the score is much higher than this, and

Â it does reflect that there is a very good similarity within this local region.

Â Okay?

Â When this happens, which happens a lot in biology, that we have two sequences,

Â that we are looking only for local regions within them that align well.

Â Okay? This is where we want to

Â solve a different version of the problem that's called the local alignment problem.

Â 4:05

Okay?

Â So now the local alignment problem is not, is different from the global alignment

Â problem, in the sense that I am looking for an alignment of substrings of X and Y.

Â In this case the best alignment is obtained by aligning X itself

Â 4:33

So if I took this substring of X, which is X, all of X, and

Â if I took this substring from Y, which is very small part of Y and

Â I did global alignment on them, what I would get is, for

Â example, AG-A, AGGA, okay?

Â In this case, again I would get that score of 20 and I have very good alignment.

Â Okay?

Â So local alignment is basically try every substring of,

Â of X, try every substring of Y, do global alignment on them and

Â then out of all these possibilities,

Â out of all these combinations of substring from X, sub string from Y and

Â global alignment, find me the one that has the highest score, okay?

Â This is the local alignment problem.

Â It basically says you know try every substituting of X,

Â try substituting of Y and globally align them using this scoring matrix.

Â Okay?

Â In this case, for example, it might be that this is the best alignment.

Â Indeed if you try to take any other letter from Y, which would still be

Â a substring of Y and you try to add it, it is going to add dash with T for example,

Â and it is going to keep lowering the score because we have a negative score here.

Â So it makes no sense to add them here or it wouldn't be optimal, okay?

Â So again, to redefine or to restate that, the local alignment problem of X and

Â Y is looking at every possible substring of X,

Â every possible substring of Y, align them globally.

Â And now out of every possible combination of substrings from these two strings,

Â finding the one that gives us the highest possible score.

Â 6:15

And this, in this case, will solve us, will give us this solution.

Â As I mentioned for global alignment, we can,

Â we have a very efficient dynamic programming algorithm for solving it.

Â And it turns out a very small modification to that

Â algorithm will solve the local alignment problem for us, okay?

Â One last word I want to say about both global alignment and local alignment.

Â 6:36

Once we define the scoring metrics, and once we define the two strings that we

Â are interested in our lining whether globally or

Â locally, the score of an optimal alignment is unique.

Â Okay? In this case it's 20.

Â There is no optimal alignment that has score of 22.

Â Because if there's an alignment of score 22, it means this is not optimal.

Â Okay? The score is unique of the optimal one.

Â However the alignment that has the highest score is not unique.

Â For example, in this case, I could have this case,

Â I could have aligned the sequences as A-GA.

Â AGGA.

Â This alignment differs from this alignment, right?

Â This, this alignment is saying that there were something interesting happening at

Â the second position.

Â A deletion here or an insertion here.

Â This is saying what happened was, in the position, the third position.

Â However, the score of this is also 20.

Â Okay?

Â So this score of the optimal alignment, 20, is unique.

Â The same thing for global alignment.

Â However, the alignment itself might be, more than one, okay?

Â The optimal alignment might be not unique.

Â If that's the case, if we have more than one possible optimal alignment in general

Â we define the problem as returning and arbitrary one, okay?

Â If there is more than one optimal global alignment we are interested in

Â an arbitrarily chosen one.

Â If there is more than one optimal local alignment we are interested in

Â arbitrary one, okay?

Â