This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

提供方

## 课程信息

### 学生职业成果

## 38%

## 38%

## 15%

### 您将学到的内容有

Understand analytic graphics and the base plotting system in R

Use advanced graphing systems such as the Lattice system

Make graphical displays of very high dimensional data

Apply cluster analysis techniques to locate patterns in data

### 您将获得的技能

### 学生职业成果

## 38%

## 38%

## 15%

#### 100% 在线

#### 可灵活调整截止日期

#### 完成时间大约为15 小时

#### 英语（English）

### 提供方

#### 约翰霍普金斯大学

The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for life-long learning, to foster independent and original research, and to bring the benefits of discovery to the world.

## 教学大纲 - 您将从这门课程中学到什么

**完成时间为 20 小时**

## Week 1

This week covers the basics of analytic graphics and the base plotting system in R. We've also included some background material to help you install R if you haven't done so already.

**完成时间为 20 小时**

**15 个视频**

**6 个阅读材料**

**1 个练习**

**完成时间为 17 小时**

## Week 2

Welcome to Week 2 of Exploratory Data Analysis. This week covers some of the more advanced graphing systems available in R: the Lattice system and the ggplot2 system. While the base graphics system provides many important tools for visualizing data, it was part of the original R system and lacks many features that may be desirable in a plotting system, particularly when visualizing high dimensional data. The Lattice and ggplot2 systems also simplify the laying out of plots making it a much less tedious process.

**完成时间为 17 小时**

**7 个视频**

**1 个阅读材料**

**1 个练习**

**完成时间为 13 小时**

## Week 3

Welcome to Week 3 of Exploratory Data Analysis. This week covers some of the workhorse statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques that allow you to make graphical displays of very high dimensional data (many many variables). We also cover novel ways to specify colors in R so that you can use color as an important and useful dimension when making data graphics. All of this material is covered in chapters 9-12 of my book Exploratory Data Analysis with R.

**完成时间为 13 小时**

**12 个视频**

**1 个阅读材料**

**完成时间为 6 小时**

## Week 4

This week, we'll look at two case studies in exploratory data analysis. The first involves the use of cluster analysis techniques, and the second is a more involved analysis of some air pollution data. How one goes about doing EDA is often personal, but I'm providing these videos to give you a sense of how you might proceed with a specific type of dataset.

**完成时间为 6 小时**

**2 个阅读材料**

### 审阅

#### 4.7

##### 来自探索性数据分析的热门评论

This is the second course I have taken from Roger Peng and both were outstanding. I have a strong math background, but not much of a background in stats, but this course was very approachable for me.

Very good course! It provide me the foundation in learning how to plot and interpret data. This will definitely strengthen my "R programming" to generate publication type figure for my genomics data!

The course on Exploratory Data Analysis was highly enjoyable. I used to do a lot of this sort of thing in my job, but now spend more of my time managing people. It is fun to get "hands-on" again.

Very nice course, plotting data to explore and understand various features and their relationship is the key in any research domain, and this course teaches the skill required to achieve this.

Nice course, but too much focus on "R" as a tool.... Industries don't use R as much... The course must be made more generic and independent of R - understand it is not easy to do but ....

Excellent explanation and adding very good skills on the way of data science specialization.For some slides they should be updated to have working URLs , some seems old and absolute now

Loved it! It took me longer than expected due to work and family issues, but I went so many times to the materials and even use some ggplot2 for work that ended being quite fulfilling.

Great intro to plotting and related tools in R. Will say that the coverage of heatmaps and PCA felt a little out of left field, with very little intuition. However, overall quite good.

This was incredibly useful because it gives you a feel for the datasets and tools with which to explore them. I really wasn't aware of the base and lattice plotting systems until now.

Good introduction. The swirl exercises kind of reproduce the lectures though- felt like it might not have been the most efficient use of time to go over the exact same example again.

I did learn more about putting together a set of graphs that help to explore the data. I did see how subsetting and aggregating data helps to give a better understanding of the data.

Amazing! Learing so much how to explore the data for the first time. This is a must do for anyone who wants to be a data scientist. Now I can use ggplot without any trouble. Thanks!

When it comes to hierarchical and K-means clustering, the theory wasn't explained clearly. When do we use U and V for what purpose? How does D come in? I'm left confused after this.

The course is interesting and the content is relevant. I do think that there are some issues with project 2 though. I did provide feedback on that to the course administrators.

This is a great introductory course on the topic and on R language.\n\nYou will get acquainted with basic R functions which are most useful for initial statistical analysis.

I learned a lot on this course, it helped me to understand and identify some of the situations I experience at work. Totally recommended if you want to apply it right away.

Seems this would type of course in an online learning MOOC would be better if it was more direct hands on "how to" and less focused on explanatory fluff (academic style) .

One of the most fulfilling courses I've taken. Already used what I've learned to analyse the COVID 19 data and get more information from it, learning at the same time.

Week 3 - clustering concepts appear hard to comprehend initially. This week should first start with a practical example/use of clustering and then move on to technical

Its one of the most important steps in learning data science. Before even jumping into the real thing, it is worthwhile to explore a little bit the data set at hand.

## 常见问题

我什么时候能够访问课程视频和作业？

注册以便获得证书后，您将有权访问所有视频、测验和编程作业（如果适用）。只有在您的班次开课之后，才可以提交和审阅同学互评作业。如果您选择在不购买的情况下浏览课程，可能无法访问某些作业。

我订阅此专项课程后会得到什么？

您注册课程后，将有权访问专项课程中的所有课程，并且会在完成课程后获得证书。您的电子课程证书将添加到您的成就页中，您可以通过该页打印您的课程证书或将其添加到您的领英档案中。如果您只想阅读和查看课程内容，可以免费旁听课程。

退款政策是如何规定的？

有助学金吗？

还有其他问题吗？请访问 学生帮助中心。