0:00

We'll now return to the questions that we

Â asked at the very beginning of this lesson.

Â Starting from genomes, how can we transform them into synteny blocks?

Â And the first thing I want you to learn is the notation of genomic dot plot.

Â Let's arrange genome along x coordinate and along y coordinate.

Â And let's represent

Â every k-mer that is shed at position x and y in the

Â genome as a dot, a position with coordinates x and y.

Â In this case, I show you genomic dot plot for k-mer size equal to 3.

Â And it only consists of the diagonal in the genomic dot plot.

Â Of course, diagonal is always present in the genomic

Â dot plot because k-mers at the coordinate x and x are always the same.

Â But here's the genomic dot plot for k=2, and you can see that in

Â this case the number of dots increased as compared to k=3.

Â For example these 4 dots correspond to repeated occurrences of the 2-mer CT in

Â the genome.

Â 1:20

So far, we talked about a genome consisting

Â of one strand, but in reality of course genome consists of 2 strands,

Â and therefore we should take care of both identical and

Â reverse complementary k-mers while representing genomic dot plots,

Â as shown here. In this case, red points correspond to

Â identical k-mers, and blue points correspond to

Â reverse complementary k-mers. Now

Â this is how a genomic dot plot looks like when we compare genome against itself.

Â But can we also construct genomic

Â dot plots when comparing two different genomes?

Â And here's an example of comparing one genome with another

Â one that is obtained from the first one by a reversal.

Â And in this case, you can see that a reversal corresponds

Â to the -45 degree diagonal formed by blue

Â points in this genomic dot plot. Now, from

Â looking at the small string, let's now construct genomic dot

Â plot for entire E. Coli and Salmonella genomes.

Â And human eye immediately see synteny blocks

Â in this genomic dot plot, they simply correspond to various diagonals.

Â And we only pay attention to sufficiently long diagonal, because small

Â diagonal, or small points, clusters of points may represent noise in

Â this case. So we see five synteny blocks, five

Â diagonal corresponding to a block marked asa a, b, c, d and e,

Â and if we project these blocks on the

Â x axis we will get an arrangement of synteny blocks in E. Coli.

Â And if we project them on the y-axis then we'll get

Â 3:27

an arrangement of synteny blocks in salmonella.

Â And now we are ready to analyze the 2-break distance between these two genomes.

Â Therefore, construction of synteny blocks essentially amount

Â to constructing diagonals in the genomic dot plot.

Â And here is the finding synteny blocks problem.

Â The input is the set of points in

Â genomic dot plots in 2D, and the output is

Â a set of diagonals in DotPlot representing synteny blocks.

Â Does it make sense?

Â 4:03

Of course it doesn't make sense.

Â Because I have not defined what diagonals.

Â Look at this genomic dot plot and you can see the diagonals

Â are not perfect because there are many insertions, deletions, mismatches and many

Â other artifacts, and therefore human eye has the ability

Â to somehow process this genomic dot plot into

Â five diagonals, but we really do not know

Â what our brain is doing while making this transformation.

Â Can we come up with an algorithm for constructing these diagonals?

Â Let's try. Let's look at the genomic dot plot between

Â two different genomes that we constructed before.

Â And let's represent every dot in the dot plot as a node in the graph.

Â And we will form edges in this graph by simply connecting closely located points.

Â For example, points that are located at a

Â distance less than maxDistance, where maxDistance is a parameter.

Â What will happen?

Â We will connect certain points by edges, and you can see from here

Â that diagonals actually correspond to connected components in this graph.

Â Moreover, we probably should ignore very small connected components.

Â For example, a component located in the upper-right (upper-left to the viewer)

Â corner and low left corners (lower-right to the viewer), but the three remaining

Â components should be accepted as synteny blocks in our genome.

Â And therefore, I offer you the following algorithm for solving the problem.

Â Let's form a graph whose node set is the set of points in DotPlot, and

Â connect two nodes by an edge if the

Â corresponding nodes are located at close distance from another.

Â The connected components in the resulting graph define synteny block.

Â And afterwards, delete small synteny blocks.

Â 6:06

Of course this is not the only algorithm that tries to model what the human eye is

Â doing, while clustering points in 2D, and our brain is very good at clustering.

Â I can give you another algorithm, for example this one that I call Amalgamate.

Â And it actually different, but it looks like it's doing almost the same thing.

Â First step, define each point in DotPlot as

Â a separate block and iteratively amalgamate the resulting block.

Â How do we amalgamate the blocks?

Â We amalgamate two blocks and combine them into a single block

Â if they contain two points that are separated by small distance in

Â another genome, and as before delete small synteny blocks.

Â 7:02

Instead of answering this question, you should

Â ask me what problem I actually want you

Â to solve, because I have never defined an

Â algorithmic problem that I want you to solve.

Â You should have refused coming up with an algorithm

Â before actually seeing the problem I want to address.

Â Unfortunately, in bioinformatics, it's not as simple.

Â I have been working on the problem of synteny block generation for 10 years now,

Â and I failed to formulate this problem as

Â a rigorous algorithmic problem, and that's why we

Â are at mercy of our intuition.

Â And some people may like Synteny algorithm better.

Â Other people may like Amalgamate algorithm better.

Â How would we choose?

Â 7:52

The choice should be done based on the practical results.

Â And if you implement these algorithms and run them, then you

Â see that synteny algorithm gives

Â a reasonable and biologically adequate result.

Â But amalgamate algorithm, also, it looks reasonable, it looks like it

Â models what human eye may be doing, gives completely wrong results.

Â After developing the algorithm for constructing

Â synteny blocks, we are now ready to

Â find out what are the synteny blocks between human and mouse X chromosome.

Â And here's a genomic dot plot for the X chromosome consisting of 25,000 dots.

Â 8:32

We form a synteny graph on these 25,000 nodes.

Â And at this slide, you can see the resulting connected components.

Â We further process of these connected components and represent them

Â as a perfect +45 degree or -45 degree diagonals in 2-D.

Â Where are the 11 synteny

Â blocks between human and mouse that we have been considering in this lecture?

Â Well if you project these 11 blocks in 2D to x-axis,

Â you will get the arrangement of human synteny blocks.

Â And if you project it to y-axis, then

Â you will get the arrangement of mouse synteny blocks.

Â Moreover, since we are not particularly interested

Â in the scale of these blocks, and even if some blocks appear

Â small, there are maybe hundreds of genes in each of the blocks.

Â It's just an issue of scale.

Â So let us now represent each block by the diagonal of the same size.

Â And once again, if we project these blocks into 2D to the

Â x axis, we will get an arrangement of synteny blocks in Phillip, and if

Â we project it to y-axis, we will get the arrangement of synteny blocks in mouse.

Â And now, after we constructed synteny blocks for two

Â genomes, How about constructing synteny blocks for three genomes?

Â You can think about generalizing how

Â a synteny block construction approach for more

Â than two genomes, but what about

Â constructing synteny blocks for 100 genomes that

Â are being sequenced now?

Â This actually turns out to be a very difficult problem.

Â And nobody in the world today is able to construct synteny blocks for 100 genomes.

Â Maybe after this course you will suggest a reasonable approach for this.

Â But an interesting thing is that if we learn how

Â to construct synteny blocks for many genomes, we will

Â be able to come up with a very

Â accurate picture of chromosomal organization in our distant ancestor.

Â So there is lot of work

Â ahead in developing genome rearrangement and synteny

Â block construction algorithms, but this is already beyond the scope of this course.

Â