Enrichment analysis can be useful for summarizing results that are statistically significant for a number of different things, not just for gene sets. So here I'm going to talk about an example where we're looking for enrichment when we compare, for example, two particular sets of results and see if they're enriched for one another. And so as an example of this we're going to be looking at SNPs, Single Nucleotide Polymorphisms. And so we're going to be looking at SNPs that are labeled with one of two different labels for two different analyses. So first we are going to look at eQTLs, Expression Quantitive Trait Loci. We'll talk about what those are later in the class, but for now just consider that one analysis has been done and one set of labels. And then a second is a set of SNPs that have been implicated when genome wide association studies have been done. So the SNPs that have been implicated in genome wide association studies, and the SNPs that have been implicated as eQTLs, we want to know if they are enriched with respect to one another. So one thing that you could do is you could look, for example, if you count the number of eQTLs that correspond to SNPs that are in a particular set of GWAS hits or GWAS SNPs. And so here that's that number at a particular P value cut off. One times ten to the negative four. And so then what you could do is you could take another randomly selected subset of SNPs that aren't GWAS hits and count the number of eQTLs. And if you do that for different random selections you get this distribution here. And so you can see that this distribution in general appears to be fewer eQTLs than what you get in the observed set of samples. So again you're doing a permutation scheme to try to identify if there's an enrichment of eQTL among GWAS hits. So another thing that you can do is you can make this two by two table. So we count the number of genes that have an eQTL and are GWAS hits. And then we can count the number of genes that don't have any eQTL but are also GWAS hits. We can also look at the case where there are no GWAS hits, the number that are in eQTL. And then no GWAS hits, the case that aren't eQTL. And then we calculate, are these two things independent of each other? In other words, is being an eQTL independent of being a GWAS hit? You can do this with Fisher's exact test or chi-square test. Those are statistical tests that you could use. You could also use the logistic regression technique that we learned previously in the class. But what people typically do is they permute the samples or, again, they produce permutations. So, here, they're not necessarily permuting the samples. They're permuting the set of SNPs that they get that are associated with GWAS hits, or they resample from the total possible set of steps. And so then they're trying to look for enrichment that they've observed. When they do that, it's actually quite complicated because they have to account for the fact that, for example, it's much more likely to get a GWAS hit if you have a higher minor allele frequency because you'll have bigger power to be able to detect it. They have to do this permutation or this resampling within levels of minor allele frequency. So this is a common problem that you run into when you're doing the source of enrichment along the genome. You find that genomic features are usually clustered together or they have common properties. And so you have to take those into account when you're doing this analysis. This is one example of a package that attempts to take into account some of that spatial structure that's involved in genomic enrichment. But there's also a number of other issues that you have to deal with. And so getting the null right is really hard in this case. In this case you want to know what is the case where we expect to see no enrichment at all, what does that look like? That's very hard to simulate from given the spatial structure in the genome, given the other things like GC content or the minor allele frequency that might influence the way that the null looks. And again it's easy to tell stories if you aren't careful. This incurs a second multiple testing problem because you might be testing multiple genomic features to see if they're enriched. And every time you do that you're going to have a new test. So again, to be able to do this really well, you have to be very careful in the very specific circumstance you're looking for to account for all the potential variability due to genomic features, genomic structure, and so forth when doing this type of enrichment.