0:48
Earlier in this MOOC, we have learned about the exciting new development
of next generation sequencing technologies.
We can now sequence one person's whole genome with about $3,000 in about a day.
These personal genomes hold great promises for future of personalized medicine.
However, each person's genome has about three million single nucleotide
variations, as well as many other types of genetic variations.
So, how do we predict the functional [INAUDIBLE] of these genetic variations?
This is the subject of this week's lectures.
1:29
In the first unit, let's take a close look at this problem.
Let's first take a look at an example in real life.
On May 14th,
2013 the New York Times published an article named My Medical Choice.
Movie star, Angelina Jolie, revealed that she has a mutation in her BRCA1 gene and
that her mother died early from breast cancer.
2:13
So do you think Angelina made the right decision to remove her breasts?
Please take a moment to really think about this question, as her decision is
a complicated one, that touches upon many core issues in human genetics.
We have created a short online survey about this.
So please fill in the survey with your honest opinion and
later we will share the anonymous survey results with all of you.
The core of Angelina Jolie's decision touches upon an important
bioinformatics question.
GIven that she has a genetic mutation in BRCA1,
what is the conditional probability that she will develop breast cancer?
Even with her mutation, there is a chance that she may be cancer free.
These two probabilities add up to one.
4:53
We never hear about Trisomy 1 or Trisomy 2 in humans,
because these chromosomes are so large and contain so
many genes that having abnormal copies of them is embryonically lethal.
Another type of microscopic or submicroscopic genetic
variation in the human genome is Structural variations or SVs.
They include deletions where a segment of a chromosome is missing.
And duplications that include the tandem duplications where
a segment of a chromosome is duplicated right next to the original copy.
As well as interspersed duplications, where a segment of our
chromosome is duplicated to somewhere else on the genome.
5:45
There are two additional types of insertions.
Many mobile elements are inserted in each of our genomes.
A novel sequences, such as virus gnomes are sometimes inserted in our genomes.
Finally, two other types of genetic structural
variation that cause disruptions of the genome but not genomic
balance include inversions where a segment of a chromosome is inverted.
And translocations where a segment of a chromosome
is moved to somewhere else on the genome.
6:22
Short insertions and deletions of one or a few,
up to a thousand or several thousands nucleotides sometimes called indels.
Indels may happen in intergenic or intronic regions, or
they may happen within protein coding regions.
Within protein coding regions, if an indel involves three or
multiplicity of three nucleotides, it will only add or
delete one or several codons without causing a frameshift.
6:53
Otherwise, it may cause a frameshift
that may result in drastic change in the protein sequence.
For example, here is an original DNA sequence, and
the amino acid sequence it encodes.
The transcriptional and translational machinery will
read the first three nucleotides C A T to make a histidine.
And the next three nucleotides T C A to make a serine, and
the C A C to make a histidine.
However, if nucleotide C is deleted, it causes a frameshift.
Now the translation of machinery would read A T T to make an I solution,
C A C to make a histidine, and A C G to make a thridine.
You see, a frameshift completely changed the amino acid sequence.
Often it would also cause a premature stop codon.
The new protein may not be stable enough to exist at all.
7:54
At a smaller scale, but
with much higher frequency, there are single nucleotide variations.
On average, in a person's genome, there is about 3 million SNVs,
roughly equivalent to 1 SNV in every 1,000 nucleotides.
8:30
Most variants are located in intergenic regions between genes.
Some of them fall non-coding RNA that are transcribed but not translated.
SNVs within coding regions tend to have larger effects than
other variations and those have been studied the most.
8:52
In the more severe cases,
SNV can cause a premature stop codon that terminates a protein early.
In this example shown, here the cytosine nucleotide is changed to thymine.
As a result the codon C U G that used to encode amino acid of glutamine,
now it becomes T A G which is a stop codon resulting in premature
termination of the protein.
This SNV is called a nonsense.
9:21
In the second example, this adenine nucleotide is changed to cytosine.
As a result, the codon C A T that used to encode the amino acid
histidine now it becomes C C T which encodes protein.
This SNV is called a non-synonymous or missense SNV.
9:58
Some SNVs at or near splice injunctions may affect splicing.
And finally, some SNVs change a stop codon to a codon encoding amino acid,
resulting in a lessening of the protein which may have altered or
disrupted stability, structure and function.
10:19
Because a nonsense SNV causes premature termination of a protein,
it is usually predicted to be damaging,
even though there are exceptions where paralogous proteins or
alternative pathways can compensate for the loss of a protein.
10:37
Synonymous, intronic, and intergenic variations are often ignored.
However, according to GWAS studies,
88% of trait-associated variations of weak effect are non-coding.
Although, individually their functional effects may not be as obvious,
because these regions are so large,
their total effects cannot be neglected, especially, a mild traits.
However, they remain under-studied, and better methods are still needed.
11:36
However, it is important to note here
that there might be ascertainment biases because
an important discovery tends to attract more research in the same direction.
Even so, many missense mutations clearly have important functional roles.
They are the focus of this week's lectures.
11:59
However, not all missense SNVs cause phenotypic changes.
For instance, BRCA1 was the first gene associated with breast
cancer in 1990 based on linkage analysis of large pedigrees
of early onset familial breast cancer.
BRCA1 has a total of 238 known missense mutations,
163 are present only in patients,
62 are present only in healthy persons, and
13 in both patients and healthy persons.
Furthermore, even missense variations seen only in patients are not all causal.
12:45
If we look more broadly, analysis of the whole genomes of over 1,000
healthy individuals in the 1,000 Genome Project revealed that,
on average, a healthy individual carries over 3 million SNPs,
over 361,000 indels, almost 16,000 deletions,
over 400 duplications, and almost 5,000 mobile element insertions.
Within protein-coding regions, on average,
a healthy individual carries large divisions that disrupt
about 150 genes over 1,000 stop coding SNPs,
77 stop losses, over 900 small frameshift indels,
over 700 small in-frame indels, nearly 70,000
non-synonymous SNPs and 60,000 synonymous SNPs.
So the questions are, what
features differentiate disease-causing variants from neutral ones?
How can we predict whether a variation is disease-causing?
Unlike sequence alignment and sequence database search,
the questions here remain largely unsolved.
And there's still lots of active researches going on,
14:18
Let's use the last two slides of unit one to look at the nomenclature.
First, when the minor allele has a frequency less
than 1% in the general population, we usually called it a mutation.
Otherwise, it is usually called a polymorphism.
Sometimes the cut off of 5% is used, but you get the idea.
14:49
People may have different things in mind when they talk about the functional or
phenotypic effects of human genetic variations.
Often people are referring to disease causing versus normal.
In evolutionary terms,
they may be thinking about deleterious, meaning causing a reduction in
fitness versus neutral meaning causing no changes in fitness.
Sometimes the phenotypic differences are personal trait differences such as height,
curliness of hair etc.
16:01
Observed protein functional and structural changes or
cellular and animal model changes do not always lead to phenotypic changes.
On the other hand, please keep in mind that
your experimental studies give you observations, not the truth.
For instance, if you do not observe functional changes
in your experiments of a genetic variation,
it does not necessarily mean that it has no phenotypic effect.
No functional assay is 100% comprehensive or accurate.
16:38
So we have to look at this question from a statistical perspective.
Genetic variations that change protein structure are more likely
to cause protein function changes, which are more likely to cause cellular and
animal phenotypic changes, which are more likely to be associated with
diseases that reduce fitness or change personal traits.
Finally, I'd like to mention that, in these lectures,
we focus on human genetic variations.
However, most of the concepts and matters can also be applied to other organisms.