In this and the next series of videos, we're going to discuss how to annotate the experimental data using Bioconductor. Annotation is the process of giving context to experimental data using external information. This can be done in many ways, but we usually think of linking our experimental data to various databases or repositories. This sounds easy but in practice, it's hard and tedious and it's also ambiguous. There're many databases that host the same kind of information but at different and smaller details, even at seemingly simple question such as, what are the genes in the human genome, and where are they located, are going to have different answers depending on what resource you use to answer the question. Here we're going to give a couple of examples of annotation. First we're going to look at annotating an Affymetrix microarray. On an Affymetrix microarray you may set up DNA using probesets. These probesets have an identify. which is a code that is somewhat arbitrarily picked by Affymetrix. And tells you particularly which piece of DNA was missing using this probe. In this process, when we annotate the probeset, we link it to an ENTREZ identifier using a database. ENTREZ is a metadatabase hosted by NCPI which is kind of a union of a set, or a superset of a lot of different databases that are hosted by the federal government. Once we have the identifier, in this case here, this probset misses/gene number 25 in the database, we can look up gene number 25 and learn that the gene symbol is ABL1. And we can learn a lot about where the genome is a gene, and what kind of function does it have. Another type of annotation is annotating a genomic interval, so here we come with a genomic interval, in this case, a 1 KB interval on the human genome, and we want to figure out are there any genes nearby, are there any regulatory elements. Is the sequence or the intervals that conserved across different species and one way of doing that is to take the interval and go to UCSC and look it up, and here we get a lot of different tracks. I've selected a few of them that gives us different types of information. Some of these tracks are really just other types of experimental data coming from other labs. In this case there's a gypsy track from NCode that measures a specific type of histone modification. When you are annotating using other types of experimental data, you should always think about that this experimental data while hopefully well-processed and well prepared, has, as always, certain biases and certain problems with it, so you should always be a little careful about interpreting it. So the challenge in this series of videos is how do we do this quick and easy for hundreds or tens of thousands of items simultaneously? In order to do that, we need to do it in a programmatic way, and preferably we want this to be really easy, so that the rate limiting factor is not, really becomes our imagination and not the fact that it is really quite tedious to do this for different databases. There are two main approaches to annotating your experimental data in Bioconductor. One is using annotation packages. Annotation packages are R packages like any other R packages you're using. And contains basically preprocessed and packaged information from various resources. This has been very popular for annotating microarrays where we are taking information from the vendor, in this case Affymetrics and packaged it off and serve it as a nice, easy to use R package. Another way of annotating of data is online resources. Online resources could be databases such as UCSC or ENSEMBL. And the advantage of using online resources is that we get a lot of data and we get it right now, we get the latest data. The disadvantages that you really have to keep track of, when did you crawl the databases, what version of the database did you use, and what kind of information did you get back?