So far I've talked in pretty general terms about statistical modeling strategies and now I want to talk about a few very specific pipelines. The first one I'm going to talk about are the steps in an RNA sequencing analysis. So recall that the central dogma of molecular biology says that information is passed from DNA to RNA to protein. And if we want to measure this RNA we use the technology RNA sequencing. And so again, this is just review from the genomic technology class. Imagine we have a fragmented RNA molecule like this. And we want to capture that RNA molecule using it's poly(A) tail. That's one of the ways in which you can capture RNA or you can extract the mature RNA. Then you can reverse transcribe it into complementary DNA, and then you sequence that using a sequencing machine. Then you can get that sequence, and you can turn it back into the RNA sequence. And so using that, you can identify the RNA transcripts by using the RNA sequencing reads. And so there are a number of different steps that come involved once you have these RNA sequencing reads in analyzing an RNA sequencing data set. So, the first step is you need to align those reads to the transcriptome, to the genome or transcriptome. This is assuming that a genome, or transcriptome is known. For some organisms that's not necessarily true here. I'm going to assume that you do have such a genome. And there you can use different software. HiSat, Rails, Star, Tophat2 all align the reads, the genome accounting for this potential splicing between exonic fragments from the genome. Then you need to count the values that correspond to a particular gene. You can either count them, of course responding to a particular gene, or you can count them corresponding to a particular transcript. Kallisto is one exception to these different software that just do basic gene counting. In this case, Kallisto doesn't require alignment, it does a pseudo-alignment so it does the counting for transcripts relatively quickly. You can also assemble and quantify these samples rather than just counting the genes and so here, this software is basically going to try to actually take these reads and estimate what transcripts actually exist in the sample rather than just counting the annotated genes. And then once it does that, it's going to try to estimate abundances for each of the different transcripts. So, StringTie and Cufflinks do this in the case where there's a reference genome that's known. If there isn't a reference genome that's known you can use Trinity. And RSEM will retake a transcriptome and then calculate the abundance for each transcript. Then the next step is you need to normalize the data just like in any kind of genomic data, you do the normalization and preprocessing. EDAseq and cqn are R packages, or bioconductor packages that you can use to normalize for GC content. DESeq2 and edgeR are actually differential expression packages that have some normalization built in. As do Ballgown and derfinder which are Ballgown is a backend for the cufflinks and RSEM pipelines and derfinder is a single base resolution differential expression analyst. Both of these have their own built in normalization. To remove batch effects you can use the SVA and SVA seek function. Or RUVseg to remove batch effects in these RNA sequencing data. Then you need to perform statistical tests and statistical modeling. You can do that with edgeR and DESeq2, or this for the case where you have count data. In the case where you have transcript quantification, you can use the Ballgown package as a back-end to RSEM or as a back-end to Cufflinks. And then if you want to do single based resolution analysis, you can use the derfinder package to perform statistical tests or statistical modeling of the RNA sequencing data. Finally, you need to do some sort of gene set enrichment analysis to identify gene sets or categories that are enriched within the sets, genes that are differentially expressed. You can do that with the goseq or the SeqGSEA packages in bioconductor. So that's a little bit of a quick run down of the different steps that you can do in an RNA sequencing analysis and what are the software that can be used to do that analysis.