本课程是 Data Analysis and Interpretation 专项课程 专项课程的一部分

提供方

Data Analysis and Interpretation 专项课程

Wesleyan University

课程信息

4.2

202 个评分

•

45 个审阅

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions....

立即开始，按照自己的计划学习。

根据您的日程表重置截止日期。

完成时间大约为12 小时

字幕：English

Data AnalysisPython ProgrammingMachine LearningExploratory Data Analysis

立即开始，按照自己的计划学习。

根据您的日程表重置截止日期。

完成时间大约为12 小时

字幕：English

章节

In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable....

7 个视频（共 40 分钟）, 15 个阅读材料, 1 个测验

Machine Learning and the Bias Variance Trade-Off6分钟

What Is a Decision Tree?5分钟

What is the Process of Growing a Decision Tree?4分钟

Building a Decision Tree with SAS9分钟

Strengths and Weaknesses of Decision Trees in SAS4分钟

Building a Decision Tree with Python9分钟

Some Guidance for Learners New to the Specialization10分钟

SAS or Python - Which to Choose?10分钟

Getting Started with SAS10分钟

Getting Started with Python10分钟

Course Codebooks10分钟

Course Data Sets10分钟

Uploading Your Own Data to SAS10分钟

Data Set for Decision Tree Videos (tree_addhealth.csv)10分钟

SAS Code: Decision Trees10分钟

CART Paper - Prevention Science10分钟

Python Code: Decision Trees10分钟

Installing Graphviz and pydotplus10分钟

Getting Set up for Assignments10分钟

Tumblr Instructions10分钟

Assignment Example10分钟

章节

In this session, you will learn about random forests, a type of data mining algorithm that can select from among a large number of variables those that are most important in determining the target or response variable to be explained. Unlike decision trees, the results of random forests generalize well to new data....

4 个视频（共 25 分钟）, 4 个阅读材料, 1 个测验

Building a Random Forest with SAS7分钟

Building a Random Forest with Python6分钟

Validation and Cross-Validation7分钟

SAS code: Random Forests10分钟

The HPForest Procedure in SAS10分钟

Python Code: Random Forests10分钟

Assignment Example10分钟

章节

Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. In this session, you will apply and interpret a lasso regression analysis. You will also develop experience using k-fold cross validation to select the best fitting model and obtain a more accurate estimate of your model’s test error rate.
To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you haven’t already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximize your experience with lasso regression analysis. Remember that lasso regression is a machine learning method, so your choice of additional predictors does not necessarily need to depend on a research hypothesis or theory. Take some chances, and try some new variables. The lasso regression analysis will help you determine which of your predictors are most important. Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets. The cross-validation method you apply is designed to eliminate the need to split your data when you have a limited number of observations. ...

5 个视频（共 32 分钟）, 3 个阅读材料, 1 个测验

Testing a Lasso Regression with SAS10分钟

Data Management for Lasso Regression in Python3分钟

Testing a Lasso Regression Model in Python10分钟

Lasso Regression Limitations2分钟

SAS Code: Lasso Regression10分钟

Python Code: Lasso Regression10分钟

Assignment Example10分钟

章节

Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis.
You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.
...

6 个视频（共 42 分钟）, 3 个阅读材料, 1 个测验

Running a k-Means Cluster Analysis in SAS, pt. 18分钟

Running a k-Means Cluster Analysis in SAS, pt. 26分钟

Running a k-Means Cluster Analysis in Python, pt. 18分钟

Running a k-Means Cluster Analysis in Python, pt. 210分钟

k-Means Cluster Analysis Limitations2分钟

SAS Code: k-Means Cluster Analysis10分钟

Python Code: k-Means Cluster Analysis10分钟

Assignment Example10分钟

4.2

完成这些课程后已开始新的职业生涯

通过此课程获得实实在在的工作福利

创建者 BC•Oct 5th 2016

Very good course. I recommend to anyone who's interested in data analysis and machine learning.

创建者 DB•Jan 25th 2018

There is some problems because of changes both in SAS and Python after creating the course

At Wesleyan, distinguished scholar-teachers work closely with students, taking advantage of fluidity among disciplines to explore the world with a variety of tools. The university seeks to build a diverse, energetic community of students, faculty, and staff who think critically and creatively and who value independence of mind and generosity of spirit.
...

Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions.
The Data Analysis and Interpretation Specialization takes you from data novice to data expert in just four project-based courses. You will apply basic data science tools, including data management and visualization, modeling, and machine learning using your choice of either SAS or Python, including pandas and Scikit-learn. Throughout the Specialization, you will analyze a research question of your choice and summarize your insights. In the Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. You will have the opportunity to work with our industry partners, DRIVENDATA and The Connection. Help DRIVENDATA solve some of the world's biggest social challenges by joining one of their competitions, or help The Connection better understand recidivism risk for people on parole in substance use treatment. Regular feedback from peers will provide you a chance to reshape your question. This Specialization is designed to help you whether you are considering a career in data, work in a context where supervisors are looking to you for data insights, or you just have some burning questions you want to explore. No prior experience is required. By the end you will have mastered statistical methods to conduct original research to inform complex decisions....

When will I have access to the lectures and assignments?

Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

What will I get if I subscribe to this Specialization?

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

What is the refund policy?

Is financial aid available?

还有其他问题吗？请访问 学生帮助中心。