这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

555 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 2: Visualization of Numerical Data

In this week's module, you will start to think about how to visualize data effectively. This will include assigning data to appropriate chart elements, using glyphs, parallel coordinates, and streamgraphs, as well as implementing principles of design and color to make your visualizations more engaging and effective.

- John C. HartProfessor of Computer Science

Department of Computer Science

[SOUND] So

we're used to seeing two-dimensional images, and

we can use two-dimensional images to perceive our three-dimensional world, and

we perceive time as maybe a fourth dimension, but

then our ability to perceive dimensions is limited to those three or four dimensions.

Parallel coordinates offer us a way of plotting even higher dimensional

spaces than that.

So in looking at the higher dimensional drawings or

higher dimensional spaces, you can think of something like a square.

A square consists of two one-dimensional lines,

these two vertical lines, that are separated horizontally.

I've taken a one dimensional line, copied into a second vertical line and

then connected them up with lines between the vertices.

So I've now got a two-dimensional square.

If I want to make this two-dimensional square into a three-dimensional cube,

I just copy the square, and move it in a direction, and then I just link up

the corresponding vertices with lines, and that gives me a three-dimensional cube.

If I want a four-dimensional cube, I just take the three-dimensional cube and

I make another copy of it and offset it in another direction, and

then just draw lines between the corresponding vertices and it gives me

a tesseract, some two-dimensional projection of a four-dimensional cube.

And so you can do this, but it's not very helpful for

visualization, because you've got all these different directions that are being

projected into a two-dimensional image, and after three dimensions, we're just

not used to perceiving the world in four or higher dimensions, and so we just have

difficulty perceiving four-dimensional or higher data being projected that way.

If I have a scatter plot, for example, I can pretty readily see x and

y encoded in the positional Cartesian coordinates of x and y here.

If I had a z coordinate that's being projected,

z is being foreshortened as it's being projected here, and

I've gotta mentally remember that z is some combination of horizontal and

vertical that's different than the horizontal x or the vertical y axis.

So you can draw these hints that are basically similar to a shadow,

where I'm indicating, with these dashed lines,

what the coordinates of this green dot are in three dimensions,

because it lacks other visual cues of where it is in three dimensions.

If you try to do this in four dimensions,

then it becomes a nightmare to manage all those dimensions.

So there's a better technique for doing this called parallel coordinates that

Al Inselberg demonstrated to me back in the early 90s,

when they were first invented, and I was quite blown away by his demonstration, and

I'll try to reproduce it here.

So on parallel coordinates, we're going to take the Cartesian coordinates.

We have a horizontal x-axis and a vertical y-axis, and

we're going to take these axes and we're going to

make them parallel instead of orthogonal, at right angles, as they usually are.

So I'm going to take the x-axis, I'm going to put it here, and

I'm going to take the y-axis and I'm going to put it here.

And so I'll label this axis x and this axis y.

So now the two axes are not orthogonal.

They're parallel to each other and they don't extend from the same origin.

So the origin here is horizontally at the bottom and

increasing x goes up this axis, increasing y goes up this axis.

So now I've got the data points, and I need to figure out where these data points

occur, so if I take the y-coordinates of each data point,

I can map each point onto its corresponding position on the y-axis.

I'm just basically dragging them across horizontally, because their position in y

in this chart corresponds to their height along this y-axis.

If I want to do the same for

the x-axis, I'm going to basically take the x position of each point, and then I'm

going to drag it to the corresponding x position on this vertical x-axis, so

the horizontal length here corresponds to the vertical length here, and

so blue and red are at the same horizontal x coordinate, and so

they overlap each other on the x-axis.

And in green is a little bit farther to the right, so

it's going to be a little bit higher on the parallel x-axis.

Yellow is a little bit farther to the right so

it's going to be a little bit higher.

And then this blue green color dot is the farthest right so

it'll be the highest on the x-axis.

So now I've got this one set of points that's now appearing as two

sets of points, and I've got a correspondence between color,

and color's a good perceptual indicator of category, but

it's very difficult to perceive what's going on here.

So what we're going to do is, instead of displaying these as points on the axis,

I'm going to connect these points with lines, and

I'm going to delete the original points, and

now we get this nice duality between points in the coordinate system,

here, and lines in the coordinate system, here.

So, this orange point, here, has an x-coordinate and a y-coordinate.

It corresponds to this line connecting its x-coordinate to its y-coordinate.

And so each one of these five points in the Cartesian x y coordinate

system corresponds to a line in the parallel x y coordinate system here.

Some other features are collinearity, so if these three points lie along

the same line, then you get this nice convergence in parallel coordinates.

And so you can see some co-linearity that happens, because lines in parallel

coordinates, corresponding to points in Cartesian coordinates that are collinear,

will basically converge in a point in the parallel coordinate system.

So you can add extra dimensions.

If I add a z-axis here, I'd have to add all sorts of three-dimensional queues to

these data points to figure out where they are in three dimensions.

In parallel coordinates, I just add a z-axis here.

And then I can connect the z-coordinates to the y-coordinates,

and follow this line from its x-coordinate to its y-coordinate to its z-coordinate.

And if I add a fourth w-axis.

It's actually easy to add a fourth axis here.

And in fact, all of these points may be collinear in the z w coordinate system,

and you can see because these lines are kind of converging to the same point.

The point that they converge to doesn't necessarily have to be between the axes.

But there are some ways of helping a person to see these points.

And so these parallel coordinates are really useful for high-dimensional data.

And you get this correspondence of being able to follow these lines through

the various coordinates and have them correspond to points here, and

it's easier to see some relationships.

There could be correspondences between the w-axis and the y-axis, and

we wouldn't see those unless we also had lines from these points on

the w-axis to the points on the y-axis, and so you get a bit of a combinatorial

nightmare when you try to connect every axis to every other axis.

So there's some decision making that needs to happen if you're going to show

correspondences between axes,

to choose to have those axes next to each other in the parallel coordinates system.

So parallel coordinates take the orthogonal axes of the Cartesian

coordinate system and they lay them out in parallel, and you can have any number of

these parallel coordinate axes laid next to each other, and you can start to

perceive higher dimensional data using these parallel coordinates.

Some of the problems of parallel coordinates are the fact that you're only

seeing the pairwise relationship between the axes, but they can be very useful for

finding certain features in your data, for example, collinearity.

[MUSIC]