In this capstone project course, we'll compare genome sequences of COVID-19 mutations to identify potential areas a drug therapy can look to target. The first step in drug discovery involves identifying target subsequences of theirs genome to target. We'll start by comparing the genomes of virus mutations to look for similarities. Then, we'll perform PCA to cut down our number of dimensions and identify the most common features. Next, we'll use K-means clustering in Python to find the optimal number of groups and trace the lineage of the virus. Finally, we'll predict similarity between the sequences and use this to pick a target subsequence. Throughout the course, each section will consist of a programming assignment coupled with a guide video and helpful hints. By the end, you'll be well on your way to discovering ways to combat disease with genome sequencing.
提供方


课程信息
We recoomend that you take the other two courses in the specizliation (or are familiar with the content) before attempting this capstone project.
您将学到的内容有
Analyzing genome sequences to find similarities and identify target subsequences using predctive models.
您将获得的技能
- Whole Genome Sequencing
- Machine Learning
- Drug Discovery
- Dimensionality Reduction
- K-Means Clustering
We recoomend that you take the other two courses in the specizliation (or are familiar with the content) before attempting this capstone project.
提供方

LearnQuest
LearnQuest is the preferred training partner to the world’s leading companies, organizations, and government agencies. Our team boasts 20+ years of experience designing, developing and delivering a full suite industry-leading technology education classes and training solutions across the globe. Our trainers, equipped with expert industry experience and an unparalleled commitment to quality, facilitate classes that are offered in various delivery formats so our clients can obtain the training they need when and where they need it.
授课大纲 - 您将从这门课程中学到什么
Comparing Genome Sequences
In this module, we'll start to get familiar with our dataset by performing some basic EDA and comparing genome sequences. By analyzing the mutations of the COVID-19 virus, we'll be able to identify some common properties of the genome that our drug should look to target.
Principal Component Analysis on Genome Sequences
In this module, we'll continue to work with out genome sequence data - using PCA to identify groups and delicate the most important features. After reducing the number of dimensions in the dataset, we'll be able to use K-means to form clusters and visualize the different areas in 2-D space.
Feature Analysis using K-Means Clustering
In this module, we'll cluster the genome sequences using the K-means algorithm. We'll optimize the number of clusters by comparing silhouette scores across a wide variety of inputs to identify the greatest drop-off. Finally, we'll set ourselves up to using prediction pipelines to predict bit scores and drug therapies in the last module.
Predicting Bit Score to Find Sequence Matches
In this module, we'll test a variety of regressors to see which one performs best in predicting bit scores for each genome sequence. Then, we'll use our chosen model to find the genome equines that are most closely related and trace out a possible subsequence to target with a combative drug.
关于 AI for Scientific Research 专项课程
In the AI for Scientific Research specialization, we'll learn how to use AI in scientific situations to discover trends and patterns within datasets. Course 1 teaches a little bit about the Python language as it relates to data science. We'll share some existing libraries to help analyze your datasets. By the end of the course, you'll apply a classification model to predict the presence or absence of heart disease from a patient's health data. Course 2 covers the complete machine learning pipeline, from reading in, cleaning, and transforming data to running basic and advanced machine learning algorithms.In the final project, we'll apply our skills to compare different machine learning models in Python. In Course 3, we will build on our knowledge of basic models and explore more advanced AI techniques. We’ll describe the differences between the two techniques and explore how they differ. Then, we’ll complete a project predicting similarity between health patients using random forests. In Course 4, a capstone project course, we'll compare genome sequences of COVID-19 mutations to identify potential areas a drug therapy can look to target. By the end, you'll be well on your way to discovering ways to combat disease with genome sequencing.

常见问题
我什么时候能够访问课程视频和作业?
我订阅此专项课程后会得到什么?
有助学金吗?
还有其他问题吗?请访问 学生帮助中心。