Expression quantitative trait loci or
eQTL is one of the most common anagrative analyses that are performed in genomics.
So an eQTL is an analysis where you're trying to identify
variations in DNA that correlate with variations in RNA.
So basically what you do is you measure the abundance of different RNA molecules.
And measure the DNA in those same samples and
then you try to correlate the variation in DNA to the variation in RNA.
This is representative of a whole class of problems that are associated with
combining different genomic data types.
Whether it's measuring proteomic data and RNA data, or
DNA data and RNA data, or RNA and methylation.
And then trying to integrate those data together to try to identify their sort of
cross regulation between these different measurements.
So one of the first examples of an eQTL study was this study be Brem et al in 2002
in Science.
And they basically crossed two strains of yeast and
they created 112 random segregants.
And so once they had those yeast segregants, they measured mRNA expression
at the time they used gene expression in the microarrays and
then they measured genotypes using a microarray genotyping tool.
And the goal was to identify associations between the expression levels as well as
the genotype levels.
And so you can think of this as basically having two components.
One is this sort of the SNP data, so that's the marker or
SNP associated with each gene in the genome.
In this case, it's the yeast genome.
And so you have the position of the particular SNP that you're measuring and
then you also have information about a particular gene.
Like how much that gene is turned on or expressed.
And then you have the information on where that gene is located in
the genome as well.
So, you're basically trying to do an association between all possible gene
expression levels and all possible SNP levels.
So, this obviously complicates the issue of multiple testing because you're
doing all possible SNPs versus all possible gene expression values.
So if you think about it as for every single SNP, you're performing basically
a gene expression microarray analysis for every single SNP.
And if you have thousands or hundreds of thousands of SNPs, that's thousands or
hundreds of thousands of micro experiments.
And you're basically looking for in cases like this where you see, so in this case,
there are the two strains.
They have the BY and RM strains, so those are surrogates for
the genotypes in this case.
And so here, you're looking for differences in expression.
So here you don't see any difference or
not a very strong difference in expression between the BY and RM strains for
this particular gene, for this particular variant.
Here for this other variant for this other gene,
you do see differences in the mean level of expression between the two genotypes.
And so that would be sort of classified as an eQTL if it
passed the significance thresholds.
And so this is typically the kind of plot that you can make
when you do an eQTL analysis, so on the x-axis here,
we've got the position of the marker or the genotype.
So again, that was where that SNP was positioned in the genome and
then you also have the trait position.
So that's where the gene expression levels were located at.
So basically you can imagine where's the gene that codes for
the mRNA that is being measured and where is the SNP that's being measured.
So then you'd just line up the chromosomes on each axis and so this circled component
right here, this diagonal line represents what's called typically CISeQTL.
So CISeQTL are often defined as eQTL where the SNP position is
close to the gene expression position.