这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

555 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 1: The Computer and the Human

In this week's module, you will learn what data visualization is, how it's used, and how computers display information. You'll also explore different types of visualization and how humans perceive information.

- John C. HartProfessor of Computer Science

Department of Computer Science

[MUSIC]

So at this point I think it's good to basically do

an overview of the different kinds of visualization that there are.

For example, there's mathematical visualization.

This is the visualization of data generated from a mathematical equation

that's readily available just by running a computer program to generate the data.

A good example of this is the Mandelbrot set here, this black object.

This is basically a shape that's in the complex plane,

these are all complex numbers, and

you get this shape basically by starting with zero in the complex plane, zero-zero,

and you take that and you multiply it times itself and add a constant, and

then multiply that result times itself and add the same constant.

Then keep multiplying it times itself adding a constant.

Depending on which constant you have corresponds to a point in the complex

plane.

And if you start at zero and keep adding that constant, squaring and

adding that constant.

If that goes off to infinity, you're out in this purplish region.

If that sticks around, you're in the black region.

Now mathematicians knew that these dynamics were interesting

in the early 1900s, but

they did know how interesting the dynamics were or what was going on.

And it wasn't until plotted this.

Using data visualization to plot the results of these points,

to find that the dynamics were very interesting.

You got all this great structure.

You could zoom into a little section here, and see even more, and more detail, and

that was something we just hadn't encountered before in mathematics, and

so visualization kind of revolutionized this area of mathematics in our

understanding of this area of mathematics by basically getting the data

out of this mathematical equation and through our visual

perception into a cognitive process so we could better understand it.

So, if If you pick a point inside this and

you start to look at the dynamics, you get what's called a Julia set.

And this plotted in the complex plane.

And one of the things I worked on with this

was to take this complex Julia set and look what's going on in the quaternions.

And again, data visualization is really helpful because you get

these whorl patterns.

That are basically connecting different parts of this Julia set to other parts of

this Julia set through these additional dimensions, and

you couldn't understand that very well without actually being able to visualize

how those connections are being made.

There's also scientific visualization.

And this is the visualization of scientific data.

And that data tends to either be measured from real world scientific devices or it.

Comes as the result of a lengthy expensive supercomputer simulation.

In here visualization can be really useful because if you have supercomputer

simulation, you can use the visualization as the simulation is being generated

to determine if there's problems with the simulation or

if you want to steer the simulation in a more productive direction.

This tends to be coordinate data, spatial coordinates, XYZ coordinates, time.

It measures things like temperature and pressure and other physical quantities,

and this is an example from my own work where we're visualizing sort of the fronts

of the blood flow as it's coming through

a lower aorta that happens to be has an aneurysm here.

It's bulging.

And the actual calculation of these fronts requires a tremendous amount of

computer time.

Work in this area has recently taken as long as a month to compute these fronts.

But we're able to visualize it very quickly and interactively so

we can see what's going on to better understand the complicated nature

of this flow which is resulting from millions of simulation steps.

And finally there's information visualization.

This is the visualization of more abstract, non-coordinate data.

For example, what we'll look at in our Data Visualization Course will be,

for example, relationship data.

And this happens to be a visualization that my students and

I worked on a few years ago, these are basically connections between different

flicker users on the flicker photo sharing website.

And there's 7 million of these connections and so

it's far to many to be able to understand by looking at just an ordinary.

Graph, but by coloring and clustering and laying things out,

we can start to see certain people are more popular and certain

photo collections are gaining more attention and we can provide

an overview of all of t hat data that you can then investigate in more detail.

And so it relies more heavily on this information visualization.

It relies more heavily on the ability to process the data

from its abstract form into something geometric, something concrete that we

can then transmit to our brain from our visual channels.

But that conversion becomes more challenging with information

visualization, then perhaps it is with scientific or mathematical visualization.

And there's a variety of different domains that visualization is used in.

It's used in medical imaging.

For example taking CT Scans and

being able to reconstruct three dimensional shapes from that so

you can get a better context of where you are spaciouly and the patient data.

Being able to take business intelligence information marketing results and being

able to present those in a way that makes it easier to make a business decision.

Educational visualization, this is a visualization of how quaternions multiply,

that I made a few years ago

that helps the user understand geometrically what's

going on with an otherwise abstract four dimensional number system.

Or geographic information systems, taking data that's geographic in nature,

and being able to plot that in ways that helps you reason

about your region, or about the world as a whole.

And finally, there's some modes of visualization, and

these are important to understand for data visualization in general.

There's three that we'll go over., the first is interactive visualization and

this is the kind of visualization that's used for discovery.

Basically, a single investigator would use this.

Maybe one or two collaborators might join in, but

its basically a single investigator in front of a computer.

And, plotting data to try to understand what's going on with the data.

You've got full control of the data and you can change what data sets and

how it's displayed on the fly in order to help understand what you're looking at.

And that's different than, for example, presentation visualization.

Presentation visualization is the kind of visualization that we use for

communication.

Its a kind of visualization that you would see in a video or

in a slide presentation and its intended for a large group or

mass audience to basically communicate some aspect of data to that mass audience.

And the difference between presentation visualization and

interactive visualization,

mainly, is that presentation visualization doesn't support user input.

So you're just sitting there and

observing, but you can't really interact with the data.

You're just sort of getting

the data packaged in a way that helps you understand it, without the interaction.

In between here, there's the Internet has basically enabled this third mode of

visualization that has been called interactive storytelling.

And these are presentation visualizations, but

they are presented via interactive web pages.

And so they allow the viewer to interact with the data in some limited fashions.

The viewer can't change the dataset.

But they can sort of investigate a little bit further and

there's more information that can be presented all at once,

as it would be with a presentation visualization.

And so another way of describing these differences is to visualize

the modes of visualization, and so I've laid them out in a table.

We can compare them in terms of user interaction.

Interactive visualization,

the user controls everything including what data set you're looking at.

In presentation visualization the user is only observing, there is no interaction.

In interactive storytelling there's a presentation, but

the user can still filter the data, or inspect details of the data, but

can't necessarily change the data set that they're looking at.

There's different graphics rendering methods.

You would use real-time rendering.

Anytime you're supporting interaction as you make changes, your

display needs to respond to those changes so you need real time graphics to do that.

Where as presentation visualization, the rendering is precomputed and stores,

say on video or images in a slide show.

The target audiences are different.

For presentation visualization, you're targeting a mass audience,

your colleagues or everybody that goes to your webpage, whereas the target for

interactive visualization is an individual investigator or maybe a small group of.

Collaborators that are working on understanding some data.

And finally, the medium is different.

When you're running interactive visualization,

you're running software and the Internet can enable some software to be run, but

it's basically running a software program on a computer.

In order to display the data.

When you're doing interactive storytelling, you're working mostly on

the Internet or some other information kiosk that can support a mass audience.

When you're working on presentation visualization,

you're working on slide shows, or video, or some other format that

allows you to prerecord the visuals, and then present them to a mass audience.

[MUSIC]