这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

554 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 1: The Computer and the Human

In this week's module, you will learn what data visualization is, how it's used, and how computers display information. You'll also explore different types of visualization and how humans perceive information.

- John C. HartProfessor of Computer Science

Department of Computer Science

So for most data visualization tasks,

we're just going to need two-dimensional computer graphics to plot and display the data.

In order to do that, we'll need to use different coordinate systems.

We use a different coordinate system to plot the data than we use to display the data.

We're going to use two-dimensional graphics a lot when we visualize data.

We're used to using two-dimensional graphics, for example,

for plotting functions and so we'll focus on

two-dimensional graphics now as we look at methods for using graphics for visualization.

What we're going to learn is the difference between vector graphics,

which are used to specify two-dimensional graphics and raster graphics,

which are used to display two-dimensional graphics.

And then we'll look at the coordinate system specifically that are used for each of them.

Vector graphics are the graphics that are used for drawing.

We're used to drawing; we take a pen and we'll put our pen down at

one point and then we'll move our pen across

the paper and then lift our pen up at another point and you get

a nice straight line between them - a nice continuous line.

In raster graphics, this is the graphics that are used,

for example, for our televisions and our phones;

they are a rectilinear array of pixels and these pixels are assigned colors.

And by assigning certain pixels certain colors,

you can represent those same shapes.

So you will draw a shape using vector graphics;

you will describe a point where you want to

start a line and a point where you want to stop

a line and you'll - you'll get

either a straight line or a smooth curved line between them.

And those will be converted to raster graphics for display,

which will consist of the pixels that get illuminated along

that path in order to display that path that you described with vector graphics.

So this process is called rasterization.

And so we'll specify a primitive - as

in a vector graphics format - we'll describe vertices,

points on the plane.

And then we will connect those points with strokes - in this case, with straight lines,

but they could be curved paths - and then those strokes can - can enclose a region.

And so we can fill that region - and we may assign a color for that region.

And for strokes who may assign a color and a width of

the stroke or have the stroke stylized,

say with dashed lines or so on.

The process of rasterization starts with these primitives defined by

these few data points and converts

them into an array of pixels so that they can be displayed.

And so this raster format representation of this triangle is as an array of pixels.

And the dashed lines represent the original primitive on the left,

superimposed over the array of pixels that are - that are representing it.

And in this case, we have pixels along the edges that are

colored blue and pixels in the filled-in region that are colored pink.

And you notice when we rasterize a shape,

we can get aliasing.

And that's the fact that this nice straight line

here in our - in our vector graphics representation,

appears as a staircased line.

You get the stairstep artifacts when you rasterize a smooth, straight line.

And because those stairsteps try to look like the original line,

but look a little bit different,

we call that an alias;

and that problem is called aliasing.

When we draw primitives in two dimensions - when we want

to draw shapes for two-dimensional graphics - for example,

for plotting functions - we're going to need a coordinate system

in which to draw those shapes so we know where to place the vertices,

for example, of a triangle.

And so we need to define a coordinate system.

And these coordinates I will call canvas coordinates;

they're the coordinates that we're going to draw things with.

In this case, the canvas coordinates have been defined to go from minus 1, minus 1

to 1, 1 and so they set up this square region of - of a plane.

The origin would be here in the middle.

You can define your canvas coordinates to be anything you like.

You want to define them to be something convenient so that you can

draw your shapes without a lot of trouble.

In this case, I've drawn a plot of a parabola - the parabola y equals x squared.

And so I've got a curved path starting at this point here going to this point here

and I've defined my canvas coordinate system to be something convenient for that plot.

In this case, I've started it going from minus one-eighth,

minus one-eighth to one-and-one eighth, one-and-one-eighth.

And that's so that as I move one-eighth,

one-eighth in, I'm at the point 0, 0

in my coordinate system and I can draw my plot going from 0, 0 to 1, 1 here.

And then I've got an additional eighth of a unit

surrounding the plot in order to do things - to add meta data,

like the title of the visualisation and draw the axes and label the axes.

So that's convenient, but as I'm resizing the display,

I may want the fonts to be larger or

smaller and I'll need a bigger margin surrounding the plot.

Or I may want the plot to be larger and the margin to be smaller.

So in - in two-dimensional computer graphics,

we can set up hierarchical coordinate systems.

And this just means you have a canvas in a canvas.

In this case, we have a yellow canvas that is the coordinate system for

the entire plot - the entire

visualization - and then we have

an inner canvas that is a coordinate system just for the plotted data;

in this case, the parabola.

And so I've set up the outer coordinate system to go from 0, 0 to 1,

1 in this region and then I've set up an inner coordinate system to go

from one-tenth to nine-tenths and one-tenth,

one-tenth to nine-tenths, nine-tenths

here and I've defined its coordinate system to go from 0, 0 to 1, 1.

So now I can plot inside

this coordinate system using coordinates that are convenient for plotting

this parabola and then I can plot in

this outer coordinate system using coordinates convenient for drawing the decorations,

the axes and the title.

And so we can define whatever coordinate system we want,

wherever we want, in order to make it more convenient to draw two-dimensional graphics.

There's also screen coordinates.

And these are the coordinates that are used for raster graphics,

for the display of the information.

And in this coordinate system,

we have a grid going from 0, 0 to whatever our screen resolution is.

In this case, since we're going from 0, 0,

we go to our horizontal resolution minus 1, vertical resolution minus 1.

If our screen resolution was 100, 100,

we would be going from 0, 0 to 99, 99.

And the pixels are located on these grid intersections.

And so you have a - a integer coordinate for each pixel location,

which is useful when you're actually displaying an image using these pixels;

you want to be able to locate each one of these pixels.

And so there is a canvas-to-screen transformation that happens.

So we're going to define our coordinate system going from some left

bottom point to some right top point

and and - and then we're going to plot using those coordinates.

And those coordinates are going to be converted to

the corresponding pixel locations on our display screen.

And those pixel locations are going to be defined

someplace on the display screen starting at x,

y and going to the point x plus the width

in pixels minus 1 and then y plus the height in pixels minus 1.

So this coordinate system happens automatically and you can define these coordinates to

be anything and you can define these coordinates to be any location on the screen.

And so your 2D graphics that you're plotting on

your canvas can be automatically resized and

repositioned anywhere on the screen just by

controlling this canvas-to-screen transformation.

You can also work directly in screen coordinates by setting up

a canvas-to-screen transformation that uses

canvas coordinates that match up with your screen coordinates.

In this case, you're just setting your left edge and your bottom edge to x and y

and you're right edge and your top edge to x and y

plus the width minus 1 and the height minus 1.

And in this case, you can specify the coordinates of your primitives in

vector graphics using the same coordinates

for the pixels that - that they will be translated to.

I don't recommend doing this because when you're working in screen coordinates,

you're not going to know what your output screen display device might be.

It could be a cell phone,

it could be a television, it could be a watch,

it could be a video wall - and all of those will have different resolutions

and you want to make sure that your two-dimensional graphics is properly displayed;

it's not too small or too large when it's displayed on different devices.

So it's better to work in some canvas coordinates that's convenient for you and let

the canvas-to-screen transformation worry

about converting it to the corresponding pixels.

So what have we learned?

We've learned that vector graphics is used to describe shapes and that

raster graphics are what we use to display those shapes using a table of pixels.

And that we can set up coordinates that - that are convenient for us to plot in a canvas

and those coordinates are different from

the raster coordinates that we use to display the canvas.

And we can set up canvases within a canvas,

which allows us to divide up the screen in ways that make it more convenient

for us to set up a two-dimensional visualization display.

So we learned that we describe shapes using vector graphics,

but we display shapes using raster graphics.

We can describe our shapes using the coordinate system of raster graphics,

the coordinate systems of the screens pixels,

or we can describe them using the canvas coordinates of our vector graphics,

or we - we can use whatever coordinate system is most convenient for us to plot the data.