这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

555 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 2: Visualization of Numerical Data

In this week's module, you will start to think about how to visualize data effectively. This will include assigning data to appropriate chart elements, using glyphs, parallel coordinates, and streamgraphs, as well as implementing principles of design and color to make your visualizations more engaging and effective.

- John C. HartProfessor of Computer Science

Department of Computer Science

[MUSIC]

So we understand how a computer works.

We understand how a human works.

Now we need to understand the data.

The different kinds of data and

how data works its way through a data visualization system.

And so there's a Data Visualization Framework that helps us

understand how data is processed for visualization.

First, data's available to us from a variety of sources, so

there's a Data Collection operation.

That basically combines the data from a bunch of different sources and

processes it into one package for visualization.

Then there is a Mapping Layer, this mapper that takes the data, and

the data's abstract and whatever computational representation it may be in,

and it converts it into some geometric representation,

something concrete, that then a Graphics Layer can take and display.

And that prepares it for visualization, for

communication across this visual channel for human perception.

So the Data Layer Is responsible for finding the data,

collecting the data from different sources.

Making sure those sources, which may have the data in different formats,

that the data can be combined into a uniform data store.

And making sure that the data is related between different sources properly.

And then there's some analysis and aggregation to run statistics on the data,

make sure the data is collected at the right frequency.

And then that data, once it's in a single package, is sent to the Mapping Layer,

which assigns the geometry to corresponding data channels, and

can also perform other rather sophisticated data processing operations,

things like contouring which take grid samples and convert them into outlines

of contours that can be useful for understanding what's going on in the data.

And finally, there's a Graphics Layer and

this converts the geometry into a displayable image as we saw last week.

It applies Decorations to the geometry.

It assigns color.

It shades the geometry.

It's also responsible for managing the interactions.

The user input during the visualization.

So that's the framework for data visualization.

As we look at the actual data involved and that mapping operation,

it's important to understand some fundamental features of data,

the types of data that are out there.

And so data can either be Discrete or Continuous.

Discrete data has Discrete values, there's no between values.

Whereas Continuous data has values between neighboring values that are possible.

And then there's comparable data so that the data can be compared to each other or

Unordered data where the values are not comparable.

So we look at all the possibilities of Discrete and Continuous data of Ordered

and Unordered, and you get the different kinds of data that you'll be dealing with.

For example, Ordinal data.

Ordinal data is data that's ordered, but discreet.

Things like shirt sizes.

You can say a medium shirt is going to be larger than a small shirt or

a large shirt is going to be smaller than a medium shirt.

But there may not be values of a shirt size between medium and large.

And these can be quantitative which are discrete and ordered.

For example the counting numbers one, two, three.

There's not one point five.

So there's no value between 1 and 2 in our counting numbers.

So they're discrete, but yet 2 is greater than 1 so they're ordered.

And so those are examples of discrete Ordered values.

You can have Continuous Ordered values, and

these are common real numbers, the real number line and so on.

and values that we'd use, for example, for altitude or temperature or

any continuous field of values.

You can also have Unordered data in discrete and continuous forms.

The discrete form of Unordered data can be Nominal or Categorical.

Things like shapes are examples of nominal data.

They're not ordered.

A square, circle and a triangle.

You wouldn't say that the square is greater than or

less than a circle, because there's no order between them.

You could have categories.

Different nationalities, or other other categories for

data that may not necessarily be comparable and those are discreet.

You can also have continuous values that are Unordered

often Cyclic values are like this so directions or

cues, you wouldn't say that north is any greater or less than east.

And you can having angle going from 0 to 360 in degrees but after you get a 360

degrees, you go back to 0 degrees and so any one is not greater than any other.

And then hues you could have red, green, yellow, blue and

you may not say that red is any greater than green and so these values

are continuous there are colors between red and green, but you wouldn't

think of red as being greater than or less than green, so they're unordered.

And finally, when we're treating data as variables, in science,

you may recall Independent Variables and Dependent Variables.

And when Independent Variables change,

they can have an effect on Dependent Variables.

That same notion happens in the data we get from, for example, Databases.

And so you'll have Key Value pairs, and

Databases will store an index and then a value associated with that index.

And as the Key changes the resulting value will change.

So you have this Independent, Dependent relationship.

In Data Warehouses the Independent Variable is often a Dimension,

something that's changing that represents Dimension.

And then the Dependent Variable is a measure that's happening

as a result of that Dimension.

So a dimension might be time, and

then the measure might be the temperature at that time of the day.

And so as the time of the day changes,

which is an Independent Variable, you would have a Dependent Variable,

the Measure, being the temperature at that time of the day changing.

And you can store that in a Database as a bunch of times of the day and

then the temperature at that time of the day as a bunch of Key Value pairs.

And so the data is basically stored in these Independent, Dependent

relationships in a variety of names depending on the context of the data.

So the important thing to remember, the three fundamental kinds of data.

There's Nominal data, there's Ordered data, and there's Quantitative data.

And we'll be using those distinctions when we determine how that data ends up being

displayed in a visualization system.

[MUSIC]