浏览 Coursera 的全部课程
Optional: Where in the Genome Does DNA Replication Begin? (Part 3)
支持 HTML5 视频
的 Web 浏览器
来自 University of California San Diego 的课程
University of California San Diego
Where in the Genome Does Replication Begin? (Part 2)
Optional: Where in the Genome Does DNA Replication Begin? (Part 3)
Optional: Where in the Genome Does DNA Replication Begin? (Part 4)
Department of Computer Science and Engineering
Department of Computer Science & Engineering
Since failure is not an option, we need to figure out what to do next, and we will try
to do it by learning a little bit of biology.
And biology will give us a hint on how we should implement our algorithms.
Some of you may find the description of somewhat convoluted biological concepts that I am about
to present difficult to digest.
If you feel like this, don't worry; if you believe me that DNA replication is an asymmetric
process, then you can simply skip this part and go to the point where we derive from this
biological knowledge how to design efficient algorithms.
The first thing to remind ourselves is that DNA strands have directions, and the two strands
of DNA run in opposite directions.
Here on this slide, the blue strand runs clockwise, and the green strand runs counterclockwise.
If you were a DNA polymerase, how would you replicate a genome?
If I was a DNA polymerase, I would do something very simple.
I would wait until DNA unwinds a little bit, recruit 4 DNA polymerases, and I would just
move them along the genome, trying to replicate.
It looks like just 4 DNA polymerases are enough to replicate the whole genome.
And when the replication fork enlarges, I continue and continue the replication process.
This is simple but completely wrong.
And if there are biology professors attending this lecture, they are probably already writing
a petition to fire me and send me to a "Biology 101" camp.
The reason why it does not work is that DNA polymerases are unidirectional.
They can only copy DNA in the direction that is opposite to the direction of DNA, which
means that when we want to recruit four DNA polymerases, two of them, this one and this
one, will be working just fine, but the two others won't be able to move because they
cannot move in the same direction as the direction of DNA.
Then we can classify DNA strands into four half-strands.
The strand that I showed you (the blue strand that goes from origin to terminus) is the
reverse half-strand, and I have no problem replicating it because moving from the origin
to terminus goes in the opposite direction to DNA.
Likewise, this thick green line that I show right now also does not present any problem
replicating -- one DNA polymerase can accomplish it.
But the two other half-strands present a big, big problem because we cannot move in the
same direction that they go.
So, if you were a unidirectional DNA polymerase, how would you replicate a genome?
Here is a potential solution.
Wait until the fork enlarges, and when it enlarges, start replicating it in the same
direction to DNA,
When the fork enlarges a little bit more, you put another DNA polymerase and continue
Four DNA polymerases wont be enough because each of these DNA polymerases that I recruited
copied just approximately 3000 nucleotides, so that we need a huge number of DNA polymerases
to proceed this way.
You can see that the resulting fragments, many fragments that are being built (called
Okazaki fragments) complicate our life a little bit.
But, in the end, after this process is over, we will have many Okazaki fragments, so that
we will have the genome copied in many, many little fragments.
And the only thing we need afterwards is to repair the gaps between different Okazaki
fragments and to replicate the genome.
So the only thing we need to learn about this process to proceed with our algorithms afterwards
is that the reverse and forward half-strand (the thick half-strands and thin half-strands)
have very different lifestyles.
The reverse half-strand lives a double-stranded life because it is constantly replicated;
there is hardly a moment when it lives double stranded [sic -- should be "single stranded"].
But the forward half-strands spend a large portion of their lives single-stranded because
it has to wait until the replication fork opens and until it starts replication.
I hope this is clear, but there is one looming question: "Why would a computer scientist
And we will learn in the next segment why a computer scientist should care about this.
So, asymmetry of replication affects nucleotide frequencies. Why? Let's think about this.
Single-stranded DNA has a much higher mutation rate than double-stranded DNA.
That's why, if one nucleotide has a greater mutation rate, then we should observe a shortage
of this nucleotide on the forward half-strand because it lives a single-stranded life.
Which nucleotide, A, C, G, or T has a higher mutation rate? And why?
It turns out that there is a peculiar statistic of mutation rates of nucleotides, and cytosine
(C) rapidly mutates into T.
What is quite amazing is that through this deamination process, rate of mutations rise
100-fold when DNA is single-stranded, which means that the strand that lives the single-stranded
life very quickly gets depleted from C.
What does it mean for us as algorithm designers?
Forward half-strands that live the single-stranded life have a shortage of C and normal G.
Reverse half-strands that live the double-stranded life have a shortage of G and normal C.
Now, keeping this in mind, let's take a walk long the genome.
We start at the terminus and, let's say we move, according to the red line, from the
terminus to the origin.
In this case, we move along the strand in which C is high and G is low, which means
that #G -- #C is decreasing as we walk.
But when we walk along this half-strand, C is low and G is high, which means that #G
-- #C, total number of nucleotides G minus the total number of nucleotides C, is decreasing
as we walk.
Again, this sounds like a peculiar and not very important thing, why do we care?
I ask one more question:
If you walk along a genome and you count the number of G minus the number of C that you
saw, and you have been seeing that #G -- #C has been decreasing, and suddenly starts increasing...
Imagine, just imagine, you walk through the genome, you count the difference #G -- #C,
and it has been decreasing, and suddenly starts increasing.
My question is: "Where are in the genome are you?"
And to figure out where in the genome you are, we need once again to look at peculiar
The only place in the genome where the behavior of #G -- #C switches from decreasing to increasing
is the origin of replication.
Which means that if you walk along the genome and see that #G -- #C has been decreasing
and suddenly starts increasing, it means you just passed the origin of replication.
And this is the hint for our algorithm.
© 2019 Coursera Inc. 保留所有权利。