这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

594 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 2: Visualization of Numerical Data

In this week's module, you will start to think about how to visualize data effectively. This will include assigning data to appropriate chart elements, using glyphs, parallel coordinates, and streamgraphs, as well as implementing principles of design and color to make your visualizations more engaging and effective.

- John C. HartProfessor of Computer Science

Department of Computer Science

[MUSIC]

So when we map data, we figure out how to convert the data

into some graphics primitive that we can display, and that conversion

is going to depend on how we perceive different graphical objects and

connect their characteristics to the characteristics of the data.

So, as we saw before,

the data visualization framework has this mapping layer.

And this mapping layer takes data in our data collection, and

takes its abstract data values, and

assigns them more concrete geometric values, and those geometric

values are used by the graphics layer to display the data for visualization.

So there's been some studies in the 80s that basically uncovered

how effective different geometric mappings are in

the perception of quantitative values that they're mapped to.

So, for example, we have this list here, and it's sorted by perceptual accuracy.

So when we see data

Indicated by mapping to position, we do a better job

of understanding the quantities than we do if the data is mapped to color or density.

So, if we have data varying along the horizontal axis here,

we can map the data to position.

And we can see that the right bar is farther along.

It's higher than the left bar.

It's got a greater position.

And, in fact, we can see how much greater the position is.

We can also see it from length, and the right bar is longer than the left bar.

And these are very good methods for displaying quantitative data.

And you can see that if you combine the two of these,

you basically get the elements of a bar chart.

Similarly, you can map data to angle or slope,

and you can see that this right angle is greater than this left angle.

That's a little bit less effective, but

these are the elements that you would see in a pie chart, and pie charts are just

a little less effective at displaying quantitative data than are bar charts.

You can also map data to area.

And so the circle on the right is smaller than the circle on the left.

It represents a smaller area.

It's just a little harder to figure out how much smaller

the circle on the right is than the circle on the left.

And you can also map to volume, and so the sphere on the right

represents a smaller volume than the sphere on the left, but

it's even more difficult to figure out how much smaller the volume is here

in the orange sphere on the right than it is for the blue sphere on the left.

And finally, color or density.

In this case, we have two equal size spheres.

This sphere on the right is lighter, and the sphere on the left is darker.

It's just difficult to know how much lighter the sphere on the right is,

compared to the sphere on the left.

So it's interesting to note that, as we move farther down the list and

find less effective ways of mapping data to geometric or spatial values,

that we move from one-dimensional values of position, length,

angle, and slope to two-dimensional values, or

three-dimensional values, of area or volume.

And when we get to color or density,

we're no longer talking about spatial values at all.

These are non-spatial values.

And so further studies have investigated this at a finer level of detail and

expanded that list.

Again, we have position and length being the most effective

representations of data for spatially, or geometrically.

But, for example, angle and slope have been divided so

that angle is slightly better than slope for discerning quantitative values.

And, finally, density has been expanded to density, saturation, and hue, so

we've basically got brightness, the intensity of a color,

how saturated it is, how brilliant the color is, and

then which color you actually pick in the color wheel.

So these are all representing, and these are consistent with the previous study,

representing quantitative values that you may want to display geometrically or

spatially.

We can also display other values.

For example, ordinal values, values, again,

that you'll want to map to a visual display.

In ordinal's case, instead of quantitative,

we don't know how much greater one value is to another.

We just know that one value is bigger than another or greater than another value.

For example, shirt sizes.

We know that medium shirt is larger than a small shirt.

We just don't know how much larger.

So for ordinal data, position still wins out,

so position is still at the top of the list for indicating ordinal data values.

But then we have density, saturation, and hue second, third, and fourth in the list.

So whereas the brightness of a color, or how intense the color is,

or which color you pick, it's difficult to make actual quantitative comparisons.

It is actually quite easy to make the comparison itself.

You can tell when one object is brighter than another object,

you just don't know how much brighter it is.

And then the remainder of these stays in the same order,

it just falls further down on the list.

So length, angle, slope, area, and volume are a little bit less effective at

looking at relative orders of things that don't have quantitative values.

And this gap here is filled in with some geometric mappings that don't

make sense for quantitative values, things like texture, connection, and containment.

And so we wouldn't think of one texture as being different, quantitatively,

than another texture, but you can make value judgments between textures, and

so this is more textured than this is.

You can make connections between objects, and

these two objects are connected and this one isn't, and

that sets up an ordering that's not necessarily quantitative.

And finally, containment, the fact that one object contains another object, and

a third object is not contained by that object,

is another ordinal organization that doesn't make sense for quantitative data.

And finally, there's a third category of data we can look at, nominal.

These are categories of data.

Ordinals are values that we can compare, but

that we don't know how much greater one is to another.

With nominal values, these are categories that aren't comparable,

things like comparing a square to a circle.

A square isn't greater than a circle or less than a circle.

They're just different.

And so, again, position is still

the strongest indicator of difference for nominal data values,

but then what would ordinarily be second?

Density, saturation, hue, texture, connection, and

containment remains as powerful for nominal axes, except that hue and

texture become stronger, and the rest of these become a little bit weaker.

And so hue, the actual color in a color wheel, is a good indicator of category,

even though it's a very poor indicator of actual quantities.

And then, finally, length, angle, slope, volume, and

area end up being similarly poor for

describing categories as they are for describing base sizes.

There's an additional value here that you can map categories to shape, so you can

map nominal data to shape, and I've been using shape as an example of nominal data,

circle, square, and so on, and that's more effective than mapping categories to,

for example, the length or angle of geometry.

So we've learned that there's many different ways of connecting our

data to different graphics attributes when we display data graphically for

data visualization, and we've learned that some of these ways are better perceived

than others depending on whether the data is nominal, ordered, or quantitative.

[MUSIC]