Hello, and welcome to the session about "Considerations and controls for metagenomics projects". My name is Sünje Johanna Pamp, and I'm a microbiologist and Associate Professor at the Technical University of Denmark. A surveillance study, as well as many other microbiome studies, contain a number of different steps. In this session we will go over each of these different steps. A project, of course, often starts with a particular goal, or hypothesis, or question in mind. And that is why it is very helpful to actually think through the entire process before actually starting the project, because otherwise some important aspects might be missed. In the beginning, for some projects you might have to contact authorities and get permissions from veterinarians, medical doctors, the IRB (Institutional Review Board), or the data protection agency, and you might also consider the Nagoya protocol. Depending on the question, if you are interested in diseased individuals, you might also think about including healthy controls. If you are interested in specific sites, you might want to choose some sites for investigation that could serve as controls. When collecting the sample, you would want to make sure to collect these under aseptic conditions to prevent contaminations entering the sample. Also, it might be helpful to consider to include replicates, like we usually do in experimental research. Then also, you can include some swabs and unused tubes or other supplies that you use for sample collection as controls because they might actually contain some background DNA levels. In terms of sample handling, we sometimes end up in a situation that we are far away from the laboratory and then it is advised to store the sample immediately on ice and, if possible, freeze it immediately so that the microbial composition of the samples does not change. Also, we recommend to avoid freeze/thaw cycles. If you would like to use the sample for different examinations, consider to make aliquots before freezing it down. In terms of the laboratory steps, it is suggested to conduct these in different areas in your lab, ideally, in different rooms - to avoid cross contamination. The DNA or RNA extraction should really be carried out in a very clean environment to avoid contamination of the sample. To measure the background level, we include blank control tubes. These are just empty tubes without a sample, but that are processed exactly in the same way as all the other samples. You might also include some samples that contain spiked organisms of known amount and ideally with a known genome sequence. You could also include some tubes that contain a mock community with five, ten, or even more different bacteria or microorganisms in general, representing different phyla. In terms of the library preparation, we prefer PCR-free approaches and that of course requires a certain amount of DNA. You might also consider to split up one sample and sequence it using two different barcodes. In that case, you ideally should see the same microbial community composition for these aliquots. In terms of sequencing, which technology you choose might depend on your question because some technologies sequence very short fragments whereas others provide you with very long or relatively long DNA sequences. Then dependent on your number of samples and the depth you would like to achieve, there might be different numbers of sequencing rounds that you will be conducting. So these are considerations that have to be made before and potentially there could be a pilot study sequencing a few samples fairly deep to get an idea about the complexity of the sample. When you have the step of DNA sequence analysis, there are different approaches to analyze the DNA read sequences, that could be just a classification, for example, by mapping the reads against a number of reference databases containing whole genome sequences, specific virulence factors, antimicrobial resistance genes, and so on. But you could also attempt an assembly approach. In terms of the statistical analysis, it might really be dependent on your question, you might be interested in diversity or might want to analyze or compare all samples that you have analyzed using multivariate analysis or regression analysis: if you, want to set your samples into context to other types of metadata that you have about the sample or the environment where the sample was obtained from. You will learn more about these approaches in other modules in detail. In general, you would want to process all samples in the same way, because the processing can introduce some changes that makes it difficult to compare the samples. We could see that, for example, when we conducted a ring trial with 11 research groups where they all were provided with the same sample and the same protocol, and they conducted the DNA extraction procedure as good as they could with the available tools they had in the lab. What we found was, when we sequenced all these DNA extracts that there were actually some differences, even though the same groups of bacteria were found in the samples, they were there to different amounts. The same we also saw in terms of antimicrobial resistance genes. We found the same antimicrobial resistance genes, but in different abundances. This can of course be a challenge if you want to compare samples from other labs. Then it is really important to note down what were actually the differences in the DNA extraction or sample handling used in the study. In the sequencing step, it is also important to randomize the samples across different sequencing runs, not that you end up with having all the samples from, for example, patients that have a certain disease on one sequencing run and the healthy control samples on another sequencing run. We also have to consider contaminations basically at every level because they could be introduced already when the sample was acquired, and they could be introduced during DNA extraction. We have to look, for example, for potential host DNA if we analyzed samples from humans or animals. In some samples they can represent most of the DNA sequences, over 90 percent, in particular, if we look at body fluids from humans and animals. When we compare our DNA sequences to reference genomes, there could be contaminations in these reference genomes, which we have to consider and test for. Then, as already indicated, we can introduce contaminations during sample handling and DNA extraction and, which we refer to as environmental background DNA. But the DNA, we also have to keep in mind, cannot only come from the outer environment, but it can also be introduced by the people handling these, so these could be potentially skin bacteria that could enter the sample. Whenever you have sequencing data, we really recommend to deposit them in public repositories, for example, EBI or NCBI, because then we can compare these samples all together which is really important in surveillance projects. Also, we suggest that you make your study available through bioRxiv and an open-access journal because information really means everything if you want to fight infectious diseases and antimicrobial resistance. It is even better if, together with your study, you can include the code and the analysis scripts that you have used for the analysis, to facilitate reproducibility. A document with guidelines for metagenomic projects, you can download from figshare, and that is affiliated with this publication, available through bioRxiv. Thank you for your attention.