0:06
Okay, everybody,
so this lecture we're going to be focusing on next generation of sequencing.
And the first one of those is so-called 2nd generation sequencing.
So the main innovation that allowed so -alled 2nd generation of sequencing.
As you remember, before we talked about we really needed a new way
of doing the sequencing chemistry, such that we didn't end up with
a chain termination every time we wanted to determine a base pair.
We needed some way that we didn't end up with a whole range of length of alogos,
in order to determine the sequence of a particular template strand.
And one of the main innovations that allowed this is so
called sequencing by synthesis, rather than sequencing by termination.
So what sequencing by synthesis means is that each time
a base pair is incorporated into a new strand that's
being copied from your template by DNA polymerase.
1:19
You can put that base pair in, you can figure out what that base pair was,
and then you can continue to grow the chain in real time.
So, it's a fundamentally different way of sequencing,
as opposed to the first generation sequencing, which terminated the entire
sequencing reaction each time a base pair was determined.
So, how does this actually work?
It can work in actually quite a few ways if we ask ourselves,
what is changing when we actually have DNA polymerase take a base pair and
add it to a growing chain of copying a template.
So first of all, we can look at the event
where DNA polymerase actually integrates a nucleotide into the backbone.
So you have a dNTP floating around.
It comes in and it hybridizes in base pairs according
to the base pair on the template strand.
2:31
When the reaction actually happens to incorporate that nucleotide
into the growing strand, there's a couple of products released as well.
There's a hydrogen ion that's released, and
there's also pyrophosphate that's released.
So the nucleotide triphosphate gets hydrolyzed
to produce the phosphate link into the backbone,
but it releases these two phosphates off the end.
So we can detect any number of those in order to do this so
called sequencing by synthesis reaction.
So there's actually a whole range of commercialized second or next generation
sequencing technologies, that utilize all of these different methods.
So I'll go into a little bit more detail on them in the next few slides,
but the thing to remember when comparing all these different technologies is,
what is used most commonly today?
And that would be the Illumina platform,
which we're going to be going into in more depth in this week.
In this platform you can get 30 fold coverage of the human genome.
Meaning that on average you're reading every base pair at 30 times, but
they're randomly spread out across the genome.
So you're not actually reading each base pair 30 times.
Some of them are read much more than 30 times, some of them much less,
but the average is 30 fold.
And that costs about 5 to 10,000, and actually this number may be even lower now
that Illumina is just recently sending out some of their newer generation machines.
4:16
But we'll see how that goes.
The first sequencing by synthesis is the so called 454 sequencing by Roche.
And this was the first next generation sequencer in the market.
The way that this works is that similar to the previous lecture I was talking about
these very tiny and microfabricated wells or these kind of nano or micro wells.
And they were able to design a system where you had a DNA polymerase
that would be copying the DNA template, which would attach to small beads.
And close to these beads then, you could couple the release of pyrophosphate.
These two phosphates that get cleaved
when the base pair gets incorporated into the growing strand.
5:38
Another type of next generation technology,
second generation sequencing technology is the IonTorrent
machine from Life Technologies, now Thermal Fisher.
So again, it uses this kind of nano micro fabrication technologies.
And essentially they've built a very sensitive massively parallel pH meter,
which detects the hydronium ion, the hydrogen ion.
Which is released every time a nucleotide is incorporated into the DNA.
So similar to the Roche System,
you have a, some kind of signal that is spatially resolved in each well.
And by changing the nucleotide that you have in the solution
available to the plumb raise at any given time,
you can tell in which well which base was incorporated.
6:40
Okay, so the Illumina or other methods that depend on
detecting the incorporation of the nucleotide itself,
really depends on several novel aspects of chemistry in this base pair.
In the base pair itself and in the DNA polymerase reaction.
So it consists of several steps.
The first step is to introduce a particular
nucleotide which is labeled with a flora for that's unique for that nucleotide.
So you have four unique fluorescent labels, one for each nucleotide.
So, you put one labeled nucleotide into the machine.
7:25
You see which ones hybridize to your particular strand, okay?
So, it will hybridize to the leading strand based on base pairing,
and then you can measure what fluorescent signal is there
at the particular spot where your DNA strand is growing.
Then through a series of unique chemical reactions of which there's
several different variants, we'll talk about what the Illumina system is.
You can then cleave off the floor four.
So the flourescent signal goes away.
And then you can do what's so-called removing a block.
So these nucleotides that are introduced into the machine are so-called blocked,
meaning that If they have some functional group on them which does not allow them to
10:21
Just to reiterate, the main aspects of second-generation sequencing are,
one, the ability to do sequencing In a massively parallel format.
So, just like synchrosequencing was scaled up to 96 and 384, well, or
even higher spatial resolution of sequencing,
the same is done with second-generation sequencing,
sequencing millions and millions of strands all at once.
12:13
GC rich sequences behave differently in PCR protocols.
And this kind of amplification bias, where certain strands are amplified more or
better than others, is inherently a problem for quantitation.
It's going to distort the levels of transcripts in the original sample.
I mentioned a technique in week one,
something called unique molecular identifiers, which can potentially
solve this problem of having to amplify things in the sample.
So, I won't go over it again here, but that's one potential solution
to get around this amplification problem.
13:08
detect what base pair is being added in a slow way.
Where you need to detect some product of the reaction and
then feed in more nucleotides one at a time.
This really limits the length of time of the assay and
therefore, how many base pairs can be read off?
A typical allumina runs are 50 base pairs to 100 base pairs.
So you can have a lot of problems then in uniquely mapping long repeat
regions of the genome, because there's not a lot of differences then between 50 and
100 base pair reads.
So if you have gene duplication events,
or if you have pseudo genes that look like other genes, it can
be very difficult to uniquely identify what those are based on the short replay.
So in order to solve those problems,
currently so called third-generation sequencing are starting to mature a bit.
And there's not many of them now.
There's a Pacific Bio Sciences is currently the market leader in
third-generation sequencing technologies.
And the way that this works is through real time sequencing.
14:27
As I was describing for the second generation technologies,
although these were sequencing by synthesis, they required
a discrete step at each base pair incorporation in order to detect
what had just happened, and then allow the next addition of a base pair to happen.
And then the Pacific Biosciences machine,
they actually monitor the growth of
a DNA template strand in real time.
The way that it works is that they immobilized DNA polymerases
onto a surface, and
then the template DNA combined to that DNA Polymerase which
then incorporates fluorescently labeled nucleotides in real time.
So essentially the way that this works is just by having very, very good and
fast optics which allow the detection of the fluorescently labelled nucleotide,
which has happened to come on to the DNA polymerase at this time.
And then, looking at that small pulse of fluorescence,
when a proper base pair occurs,
you have a larger lifetime of the fluorescence being there.
And then the polymerase incorporates that on the order of base pairs per second.
So this is much, much faster than the second generation technologies.
So a little bit more detail on how this actually works is that
you have DNA polymerases in each of these little spots.
And you have a type of fluorescence detection which is only
looking at 100 nanometers above where these DNA polymerases are attached.
16:28
So then what happens is that
this DNA polymerase is copying the template DNA which is bound to it.
And they have special nucleotides in solution which,
instead of having three phosphates, have six phosphates.
And the fluorescent tag is on the very end of all these phosphates.
So the DNA polymerase is copying the template,
it needs the next base pair, which is diffusing from the solution.
And when the proper base pair comes down, the base pairing interaction,
the hybridization between the two base pairs,
is relatively stable, compared to incorrect ones.
So the stable one, then, has a much longer average lifetime of being on that strand,
which can be picked up then, by the fluorescence in long pulse,
and fluorescence in real time, by this machine.
And depending on the color of that fluorescence pulse,
you can tell which base pair was spending a lot of time there.
And therefore, which one was much more likely to have been the correct base.
So this is an example of what some actual data might look like coming off this
machine.
And as you might expect, because it seems to be detecting single molecules,
there's actually a lot of noise, and the intensity and
the length of time it takes each fluorescence pulse to develop.
But one really powerful
thing about this technology is that you can actually detect base modification.
So sometimes, DNA bases are methylated, for example.
And in that case, these DNA polymerases, which are used in the Pac Bio machine,
you can detect these types of methylations because it takes a lot longer for
the DNA polymerases to incorporate that into the growing chain.
So for example here, there's some data showing on
the right where if you have a methylated adenine,
that it takes a lot longer to incorporate than if it's not methylated.
But also here, looking at some of this data, you can see that this machine,
because it's a single molecule and real time detection,
that sometimes there might be quite a bit of difficulty in calling the right base.
For example, here if we look at this bottom plot, and
look at the cytosine residue that was called here.
It's actually a very, very small pulse in a very short duration.
So, even though there might be a cytosine residue being incorporated here,
it may just be kind of a random pulse that the cytosine happened to come into
the field of view of the imager for just a moment and then came back out.
So it's because of these noisy kind of single molecule events happening
why this third generation technology, this single molecule real
time method is actually quite noisy and has a very high error rate.
For the next lecture we're going to be talking in much more detail on
a second generation technology, one that is used very commonly,
based on the protocol and machine that's built by Alumina.