So these are exciting time in the field of biology, because we are moving from small scale biology to big data biology. And in this lecture, I'm going to describe to you some of the trends that are happening. In this book, which is a business leadership book about big data. It does not have anything to do with cell biology or biomedicine. The revolution of big data is very well defined. And here are some quotes from that book. So one idea with big data is that you let the data speak. So you enable discovery by identifying patterns without pre-notion of what you are looking for. Another thing that biologists are obsessed with is causality. And this book recommends to let go of some of this obsession with causality. So the point is that, sometimes worrying about causality can lead us down the wrong path. Another important aspect that biologists are already used to, is feeling comfortable when there is some missing data. That we can't measure everything precisely all the way to the atom level. And the book suggests that we should feel comfortable in that area of uncertainty, and not be so concerned with exactitude. Finally, even though we are giving up on looking at those specific one to one functional interactions. We get a global perspective of the biology of the cell. So in the next few lectures, Andrew, who is a post-doctoral fellow in our lab. Will describe to you many of the emerging resources that can be integrated into a global framework. And many of those resources we processed for the enricher tool. When in fact, those resources can be integrated into networks. And luckily, many of them have very well structured data set. And the IDs can be matched and converted to standard gene IDs, drug IDs, and phenotype IDs. And this way, you can start bringing those data sets together to form various types of analyses. Let's go over those seven categories. The first category is drug and gene perturbations. Followed by genome-wide gene expression analysis. In many parts of the course, we discuss this type of data. And this also is a central piece of the crowdsourcing challenges. We also talked about Chip-Seq data. Chip-Seq is a method to profile the binding location of proteins on the genome. There is also data available about cell viability after knocking down single genes or treating cells with different drugs. And those type of studies apply those methods to many different cell types. So that data is a large that has on one side drugs or gene perturbations, and on the other side different cell types. We are also accumulating knowledge about the effect of single mutations, or single deletion of a gene on either a human or mouse phenotype. And those phenotype, they are relatively well organized in ontologies. A single gene mutation or knockout mouse, can be mapped into the resultant phenotype. And that's an important link between a gene and the full organism phenotype. There is also accumulating knowledge about gene expression from patients. Specifically, complete genome Y profiling of tumors from cancer patients. As well as gene expression profiling of post-mortem tissues from humans. There are many databases that collect protein-protein interactions from the literature, as well as curate, cell signaling pathways. There are also a growing body of work that profile protein-protein interactions using high content method. Like immunoprecipitation followed by a mass spectrometry, as well as these two hybrid screens. And then finally, there are resources that describe the effects of drugs and other toxins on the human phenotype. And those human phenotypes are typically described as diverse events. So all those resources can easily be converted to either gene set libraries, which we talked a lot about in this course. As well as network that connect genes or other entities like drugs, or cells or phenotypes. As well as bipartite network that connect, for example, cells and drugs, genes and cells, genes and phenotypes, et cetera. And then finally, all of, all of those resources can be con, connected using those common identifiers. And we can go and form those tripartite and multipartite relationships. And build classifiers that can fill in the gaps and identify, for example, side effects for drugs that haven't been marketed yet. And this is one of the crowdsourcing projects that this course offers. [MUSIC]