这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

数据可视化

555 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，深入学习模式发现的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Week 3: Visualization of Non-Numerical Data

In this week's module, you will learn how to visualize graphs that depict relationships between data items. You'll also plot data using coordinates that are not specifically provided by the data set.

- John C. HartProfessor of Computer Science

Department of Computer Science

[MUSIC]

So for handling non-coordinate data, non-numerical data,

we're often more interested in the relationship between data

items than in the data items themselves.

And so we'll use a graph or

a network to represent the relationships between data items.

And so we'll talk here about different ways of specifying a graph, and

different attributes of graphs, and how graphs are represented computationally.

So a graph just consists of nodes and edges.

In this case, we have four nodes and we have five edges connecting these nodes.

And so the nodes will represent a data item and

the edge will represent a relationship between two data items.

If we add this particular edge,

then every data item is going to be connected to every other data item.

We would call that a complete graph, or a clique, of these four nodes.

Here on the left, I've got I've got a graph of four nodes connected by

five edges, and on the right I've got four nodes connected by five edges.

In fact, these two graphs are isomorphic.

They're the same graph, they're just laid out differently.

And on the right, I've got a planar graph, which means to nodes and

edges are laid out so that none of the edges cross each other.

And then I can speak of a face as being the region bounded

by a cycle of edges starting and ending at the same node.

And also, we would say that these are two different embeddings

of the same graph or at least these graphs are isomorphic.

If we think of this whole arrangement of eight nodes and

ten edges as a single graph,

then that would be one graph consisting of two connected components, but

the entire graph containing both of these sub graphs would be a disconnected graph.

We can also have directed graphs.

So in an ordinary graph, or an undirected graph,

just has edges, but does not really suggest the direction of these edges.

We can have a directed graph by using arrows to indicate a direction, so

that we would have a connection from this node to this node, but

not necessarily a connection from this node back to this node.

And you can think of a graph as having a cycle, a directed graph having a cycle, so

this undirected graph, this ordinary graph has a cycle because these three nodes.

I can start at this node and I can follow the cycle to get back.

You can have a cyclic directed graph, because I can follow a cycle here.

This graph is acyclic, because there is no cycle.

Once I get to this node I can go to this node.

Once I get to this node, I can't go any place further.

So there's no way to get back to any of these nodes

by following these edges in the directions indicated.

You can also have trees.

Any graph that's connected

that has one less edge than the number of nodes forms a tree.

It's minimally connected.

And you can think of having a parent node, but for an undirected graph,

any of these nodes could be the parent and you can think of multiple siblings.

Here's some graphs with many more edges and

you can see kind of the way they're laid out would imply a parent.

But I could lay these out in isomorphic methods,

equivalent methods, and another potent node might appear to be the parent.

When you have directed graphs, the tree forms a hierarchy and

then you can have a parent.

So in this case, the child nodes, these child nodes point to their parents.

And so I've plotted their parents higher than the child nodes.

And then there is a clear root node.

This one parent that's the parent of all of these nodes.

You can also have a hierarchy that's not a tree.

In this case we have a parent relationship,

but this child node has two parents.

And so you've got a definite hierarchy here, but

it's not a tree because a tree would have each node would have a single parent node.

There's also a relationship between

the number of edges each node has and the kind of graph you're looking at.

We talk about the degree of the node as being the number of edges

extending from a node.

Directed graphs would have a different in degree than an out degree.

The in degree would be the number of edges going into the node and

the out degree would be the number of edges leaving the node.

And then these, the number of nodes you have of each degree,

if it follows this kind of fall off, it's called a social network.

A social network, you can think of this as a friend's network or

many natural

data relationships follow this power law, this social network power law.

In this case, this is the number of interactions of yeast proteins.

Each one of these nodes is a yeast protein, and

the edges represent interaction between these proteins and, in this case,

you have many nodes that have a few interactions that are connected to,

you know, one or two, you have one or two edges.

And then you have many fewer nodes that have high degree, that have a lot of edges

that are connected to a lot of other nodes by a single edge.

And so in this case, the number of nodes that have a certain degree

follows a power-law.

If the y is the number of nodes with that degree,

is equal to the degree to some power with some constant multiplied to it.

And you see this fall off happen quite often.

These graphs tend not to be plainer, they tend to be difficult to embed.

They also tend to be the most popular graphs that we encounter in real life.

Finally, we're going to use an adjacency matrix to represent graphs.

And so in this case we have a graph that has four nodes, four data items.

One, two, three and four,

and then we're going to represent this graph using this matrix.

In this case, the Adjacency Matrix will have a one in row one,

column two, if there's an edge connecting node one to node two.

Likewise, it'll have a one in row two, column one,

because it connects node two to node one and so it will be symmetric.

Because this is a non-directed graph.

If it was a directed graph, then you would connect, if it was directed from

node one to node two by an edge leaving node one going to node two.

Then you would have a one in row one column two but

you would have a zero in row two column one.

So a directed graph would have asymmetric or

could have an asymmetric adjacency matrix but an ordinary graph,

a non-directed graph, would just have a symmetric adjacency matrix.

And these edges could also have weights,

in which case you might not just have zero or one, you might have a value here

to indicate how strong the connection is between nodes.

And the diagonal is usually zero, unless there's some relationship between a node

and itself, and you can represent that along the diagonal.

So it might be good to review graphs and the different kinds of graphs,

the different representations of graphs, and

the different attributes of graphs to familiarize yourself with them,

because we'll be using those for the rest of this module.

[MUSIC]