In this lecture, we are going to talk about integrating
many of the components covered in the entire course.
For applications in personalized medicine and systems pharmacology.
The NCI's, the National Cancer Institute's, TCGA,
which is The Cancer Genome Atlas.
An area where you can get a lot of data to analyze and
apply many of the methods you are learning in this course, this NIH program.
Collects genome wide molecular data
from thousands of patients with various types of cancer.
This includes mRNA expression profiles of tumors,
data about Mutations, and snips, as well as clinical
parameters including the patient recurrence, and survival outcome.
Which is critical for identifying risk groups
by correlating molecular markers With outcome.
So let's take a live quick look at the TCGA data portal.
So this is the TCGA website.
You click on launch data portal.
And here you have all the various cancers.
They're going to look at a breast cancer.
So we're gonna click on this link.
And this tells you the number of various different types of
datasets that are available.
So if we are interested in the mRNA expression Datasets.
There are almost a thousand patients that have that data.
And then we are getting into this dataset interface,
where you can make a certain selection.
For example those tumors will profile, three different platforms.
There is RNAseq data and then microarray data, so we can select,
let's say, only this agilent microarray data, and then we build an archive,
and that archive is then gonna be emailed to us through a link.
So you will receive the link in your email after you've registered with the table
that contained that data set, and there are various levels of the data.
You can get the raw data or you can get the processed data, at various levels.
So the first application of analyzing the data from TCGA that I'm
going to show you is, visualizing networks and grids of patients.
So now here we combine the gene expression data, the microarray.
Here at Expression Data there's some clinical data for
those patients, visualize those network and grids.
So in process the gene expression data from PCGA for
breast cancer to create those networks and grids for
patients based on their gene expression profile similarity.
This approach identified patients with similar tumors using,
initially using PCA, which we learned about.
Then we overlayed information about the clinical outcome on ER status.
The estrogen receptors status for
each of those patients which is consider a marker for treatment decisions,
and we want them to see if we can find the cluster of patients have
similar outcome or similar estrogen receptor classification.
And this is good for, if you have new patients you can place them within those
networks, or grids and this can be used directly to predict their risk outcome.
And predicting their risk outcome for patient is important for
physicians to make decisions regarding the treatment they should prescribe.
For the rest of this lecture,
I'm going to talk to you about a paper that we recently published.
And the title of the paper is called Metasignatures Identify Two Major Subtypes
of Breast Cancer.
So in this paper we found a new way to analyse the gene expression
MRNA data from TCGA using enrichment analysis.
So you should all be familiar now with the enrichment analysis idea, and
I'll show you how we use that for classifying breast cancer patients.
So this is a summary slide of the entire study.
The first step we took was to process the data from a publication that
tested the effect of 77 drugs On 33 breast cancer cell lines.
The goal of that study that was published in PNAS by Houser et al was to identify,
which breast cancer cell lines are sensitive to reach drugs, and
this resulted in a bipartite network that connected drugs to cell lines.
You also had MRNA expression data from those cell lines,
from those 33 cell lines.
So we perform how radical clustering and PCA.
To identify groups of cell lines that show similar expression patterns.
We also downloaded the cancer mRNA micro array data from TSGA,
and performed hierarchical clustering and
PCA to identify groups of patients that show similar expression patterns.
So far this was standard type of analysis.
The special thing that we did next was to perform enrichment analysis
on the genes that are highly expressed in each patient or cell-line,
using the and eastern modification gene set libraries, and
I mentioned to you in the Enricher lecture.
This resulted in a vector of enrich terms for each patient or cell line.
We call these vectors meta signatures,
because they are inferred from the mrna signatures.
We then clustered the patients based on their mrna signatures or
based on the chia and histone modification metasignatures.
This resulted in different groupings of patients compared to what you get
when you do the standard classification of patients based on mRNA expression.
We also repeated the same medicine which you are approach to all the cell lines.
Finally, we connected patient clusters to cell line clusters and
these indirectly connected the best drugs.
Two clusters of patients that are most likely going to benefit from those drugs.
So this is what the ChEA meta-signature hierarchical clustering result looks like.
The columns are 536 Breast cancer patient tumors and
the rows are transcription factor from the ChEA database.
So the ChEA database contained each transcription factor,
a list of genes that that transcription factor is punitively regulating.
And this is based on.
Each patient slash tumor has a set of genes that are highly
expressed in the tumor, compared to all other tumors within this group.
So the brown spots are those entries that have high overlap or
low P value since we use the Fisher Exact desk to assess the overlap.
So the results point to 2 D clusters of patients and
a few small clusters of patients.
So the history modification [INAUDIBLE] of
the meta-signatures is showing a similar result.
We get two big clusters and few small clusters.
The clusters from the chia and histo modifications, they match each other.
So the two big groups of patients fall in exactly the same two groups
of patients in the Chia signatures, and the histone modifications meta-signatures.
The color bar on top of the enrichment result, is the clustering that you will
get if you use the regular mRNA based patient classification.
And this is to show that we get very different clustering
when we apply this meta-signature hierarchical clustering approach.
However, if you look carefully on those colored bars on top of those two
hierarchical clustering images,
you can see the classification is not completely random.
So now we can ask whether our clusters that we identified
using the meta-signature approach correlated clinical outcome, and
we did this with Kaplan–Meier survival curves.
These curves visualize the percent of patients
that survive over time within each cluster.
So we can see that for the two big clusters, there is a big difference.
One cluster shows better outcome, the red one, than the other.
The other one is the blue one.
We also identify a cluster of 27 patients.
That show a 100% survival.
The labels near each line are the top and rich terms for each cluster.
So we are not only getting classification but we also see the potential molecular.
Explanation, the groups of genes functions within those clusters.
So those patients that have highly expressed data and
stack genes, and also co-relate with genes that are regulated
by the H3K36ME Eastern Modification, those patients have 100% survival rate.
Finally we can connect the clusters of patients,
we found that the meta-signatures to potential drugs
thru the expression data from the cell lines.
We see that there are many cell lines that can be linked to the groups of patients
with the better outcome, but only one cell line
is weakly connected to the patient to the clusters of patients with a bad outcome.
And this suggests that maybe we need to develop more cell lines that are similar
to the patients with the bad outcome, and once we have such cell lines,
we can potentially develop drugs that would kill these cells, and
hopefully, will be effective in people with this type of cancer.
[MUSIC]