Hi and welcome to the class Introductions of Bioconductor. My name is Kasper Damon Hansen and I'm going to be your instructor. Firstly let's cover what is Bioconductor and why should you care. So Bioconductor is an open source and open development software project for computation biology. Open source means that anybody can read and modify the underlying code. Open development means that anybody can contribute and participate in the development of a code. Bioconductor is built on R language which is a widely used language for data science. The R language was chosen for wide variety of reasons. Including that it's a flexible language specifically designed for data analysis. It has high quality graphics and it's easy to embed tools written in other languages. There are tons of great tools out there for computation biology, written to accomplish very specific tasks. Doing our analysis in computation biology almost always means you use many, many tools and you put them together into a file, package or pipeline, try to answer your specific question. One of the ideas behind bioconductor was the realization that what we really needed is a flexible data analysis platform where it's easy to put these pieces together. The mental image behind the project is that all these great tools out there in the world are the different musicians in the orchestra, and the project really works as the conductor. Which makes them all play well together. From a more pragmatic picture, Bioconducter's collection of software packages written or partly written in the statistical language R. These packages are collected into a single archive, or something called a software repository. Over the years we have seen an incredible growth in the number of packages and contributions to Bioconductor. At the latest release, we had almost 1,000 different software packages. Some of these packages have been around for ten years or more since the project started, and some of these packages are brand new. Some of the packages of Bioconductor have very few users, and other packages are considered the gold standard analysis tools for the specific domain that they are addressing. And these packages are often used by thousands of researchers around the world. Since anybody can contribute packages to Bioconductor, as long as the sofa the right meets the minimum requirements, we have a very varied selection of tools available. Some of the packages are extremely high quality. Others are less polished. In general, all packages a bio-conductor have a pretty good documentation. At least four compared to other academic software. And some packages have outstanding documentation. However, one of the challenges in the project is sometimes to figure out where do I find the documentation. Because there are so many packages, and because anybody can contribute. You often have a wide variety of packages to choose from when you want to analyze a specific type of data, there are probably hundreds of packages dealing with gene expression micro rate data. Sometimes these different packages within Bioconductor are direct competition for users. They are doing perhaps more or less then the same thing or using two different analysis approaches to try to answer the same questions. That makes it sometimes a little bit complicated to figure out which package should I use and how should I use it. But in the end, all off this competition is great for the end user because it means we get a better product in the end. In 2014, the journal Nature Genetics had an editorial on sharing, and the use of software for genetics research. At this point in time when the editorial was written, Bioconductor and CRAN, the Comprehensive R Archive Network, which is another software repository of R packages, were specifically highlighted in the editorial as high quality software repositories. In fact, these two repositories were the only software repositories across all languages and platforms that was endorsed by this journal. That tells you something about the impact and mindset that bioconductor has in the genetic research community. Since its beginning, Bioconductor's emphasized reproducible research. The idea that computational research ought to be reproducible so that others can understand and build upon the effort of others. And Bioconductor has been an early adapter and a driver of tools to do this thing here. We've been doing reproducible research for more than ten years, long before it got popular. But why? Why should you care about Bioconductor? In the end I'm using Bioconductor every day of my research because of two main parts. The first one is productivity. Once you have a certain familiarity with Bioconductor and R, you get incredibly productive. I get much more done per hour in this platform than I would using almost any other platform. The other major part in driving an adaption of a Bioconductor is flexibility. There are no true computation biology analysis that are the same. There's always small tweaks you have to do, small modifications having to do with your specific question and your specific data. And for that it becomes incredibly important to be able to flexibly modify the tools you're using. And bioconductor allows you to do this. Now for the academically inclined, Bioconductor is described in two main academic papers. The first of these two papers mentioned here was the original manifest that was written back when the project was initiated a little more than 10 years ago. The second paper is a recent update describing where the project is now, what we can do, and what will go on in the future. And both articles should be accessible to a wide audience. I would encourage people to take it and read them when they have time if you are so inclined. So welcome to the class. I hope you're going to have fun in these next mini lectures. But more importantly, I hope that this class will give you the foundation to get more productive and effective in doing your research in computation biology.