这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

555 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 2: Visualization of Numerical Data

In this week's module, you will start to think about how to visualize data effectively. This will include assigning data to appropriate chart elements, using glyphs, parallel coordinates, and streamgraphs, as well as implementing principles of design and color to make your visualizations more engaging and effective.

- John C. HartProfessor of Computer Science

Department of Computer Science

[SOUND] So, data visualization can consist of some very simple charts,

but the success of a data visualization can often depend on how

we map our data variables to the elements of those charts.

So we can start with a Bar Chart.

And the bar chart has two axises typically.

You've got a horizontal axis and a vertical axis.

And you're usually measuring discrete values here, and

some either discrete or continuous value vertically.

And this benefits from the fact that you're mapping a variable,

a data variable, to both position, the actual height of these bars,

as well as to a length, the size of the bar.

And so you do a really good job of not only seeing.

That, for example, the orange bar is larger than the blue bar, but

how much larger the orange bar is to the blue bar because

position and length are both at the top of perceptual effectiveness for

displaying quantitative values.

And so usually vertically we have some sort of quantitative dependent variable.

And then horizontally these can be categories.

And so we have some nominal variable or at least some discreet variable here

indicating the individual bars that we're plotting.

And this is an independent variable, a dimension.

And then this is some kind of measure of that dimension.

It's a dependant variable depending on the value of this independent variable.

Similarly, you have a line chart.

A line chart has data points that are connected by a line.

And so this is very very similar to a bar chart.

These data points are at the same altitude as the tops of the bars.

So they benefit from position but they don't have the length.

That you visually see with the bars in a bar chart.

So you still do a pretty good job of

being able to discern quantitative values and their relationship of quantitative

values in the altitudes of these data points in a line chart.

And so again we have a quantitative dependent variable vertically that's

changing based on some quantitative independent variable horizontally.

But now the horizontal value is some quantitative continuous variable and

the vertical value also needs to be a quantitative continuous variable

because we're drawing lines between these data points and these lines imply

that there's a continuity of values between these data points and

these data points have a horizontal and a vertical component.

These lines have a horizontal and vertical component.

So you don't want to use a line chart to display data across categories because

that's implying that there's in between values in between these categories and

if they're nominal categories, if they're discreet,

then there should not be in between values.

Your visualization shouldn't imply that there's in between being values.

If we remove the lines, we get a scatter plot.

And a scatter plot gives us some other flexibility.

When we display a line plot, we're displaying a function.

We're displaying some dependent variable

that's changing according to an independent variable.

So that there's one dependent value for every independent value.

So there's basically one measure for each change in dimension.

When we do a scatter plot, we have two independent variables so

that I can have the same horizontal value here and

I can have two values associated with that and so that can be a powerful value.

You usually don't connect these with a line unless there is some order in

which the data is coming in that you want to associate with a line and

that would be an additional dimension you could indicate on a scatter plot.

But the line doesn't infer that you're plotting a function, because

a scatter plot doesn't plot a function unless the data's organized that way.

And so you have two independent variables, a horizontal independent variable and

a vertical independent variable.

And you're getting an indication of position, both, horizontally and

vertically, for the quantitative values on each of the two axes.

You also get some cues based on density if these points tend to cluster in certain

areas.

You can also create a Gantt Chart,

which is kind of looks like a sideways bar chart, except the bases of objects don't

line up with one of the axises like they would in a bar chart.

And so in this case, we have two independent variables,

things that are no longer related as a function, but

you still get the benefits of position and length.

Gantt charts are usually processed diagrams that tell you

the various stages of a project.

And so horizontally a Gantt chart would usually be some display of time.

This may be a quarter, or date, or some other time axis.

And then vertically, this is some categorical, often a discrete or nominal

independent variable here vertically, and this is typically the tasks.

So you'll have the first tasks and then the second task, and

the second task may start before the first task finishes.

And tasks may stop and then start up again, and so you get this overlap.

Again, it benefits from both position and length.

but it operates from two independent variables.

Again one could be quantitative and one could be nominal similarly to a bar chart.

But in a bar chart you have one dependent variable

plotted over an independent variable.

In a Gantt chart you have to independent variables And, finally, you have a table.

In this case, you have two nominal variables, two categories,

for example, they're independent variables.

One doesn't depend on the other necessarily and

you're just looking at two separate dimensions, and

in plotting some value that would be the entry in each of these table entries.

So it really benefits from position only, and again, that position is discrete or

nominal.

It's not a continuous position, as it would be in a scatter plot.

It's in discrete, quantized regions.

You might also notice if you look at this long enough,

you can see some flashing happening at the intersections.

And it's, again, important to remember your perceptual psychology

to know when you're laying these things out to pay attention to

contrast to make sure that you don't get some unwanted perceptual features.

So here's a table that visualizes the decision

you need to make of what chart to use in various situations,

depending on the data that you want to display.

Very often you have at least one independent variable and then you

may have a dependent variable on it or you may have an independent variable.

And your independent variable might be discreet or nominal, some category,

or it might be some some quantity tha varies continuously and

your dependent value could similarly be continuous or discrete, or an independent

variable could be a category, or it could be a continuously changing value.

Independent of your horizontal axis, and so depending on each

of these configurations you could look up in this table which you want to use.

If you have an independent variable and

a dependent variable, then most often you want to use a bar chart.

You can use a line chart, but only when you have a continuous dependent variable

and a continuous independent variable, because the lines indicate

that they're in between values both horizontally and vertically.

You want to use a Gantt chart if you have a independent variable.

That's continuous and a categorical axis vertically or

a categorical axis horizontally and a continuous value vertically.

Either one of those will form a Gantt chart.

If you have two categorical axes, you want to make a table.

And if you have two continuous axes that are both independent,

you want to make a scatter plot.

So we use the kind of data that we're trying to visualize nominal,

ordered, quantitative, whether it's continuous,

whether it's discreet, whether variables are dependent or independent.

To not only figure out how they map to chart elements, but

more importantly to decide which chart best displays them.

[MUSIC]