这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 伊利诺伊大学香槟分校 的课程

Pattern Discovery in Data Mining

119 评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Module 3

Module 3 consists of two lessons: Lessons 5 and 6. In Lesson 5, we discuss mining sequential patterns. We will learn several popular and efficient sequential pattern mining methods, including an Apriori-based sequential pattern mining method, GSP; a vertical data format-based sequential pattern method, SPADE; and a pattern-growth-based sequential pattern mining method, PrefixSpan. We will also learn how to directly mine closed sequential patterns. In Lesson 6, we will study concepts and methods for mining spatiotemporal and trajectory patterns as one kind of pattern mining applications. We will introduce a few popular kinds of patterns and their mining methods, including mining spatial associations, mining spatial colocation patterns, mining and aggregating patterns over multiple trajectories, mining semantics-rich movement patterns, and mining periodic movement patterns.

- Jiawei HanAbel Bliss Professor

Department of Computer Science

When first studying Mining Spatial Associations.

Spatial association or spatial frequent patterns share some

commonalities as a general association and frequent patterns.

For example, the association rules are also in the form of A in plus B,

with certain support and a confidence.

In this context, A and B could be sets of spatial or non--spatial predicates.

The spatial predicates may indicate topological relations or spatial

orientations or distance information like close to, within certain distance.

And it measures support and confidence are very similar to the general ones.

The rules with him find it could be like if x

is a large town then x intersect with the highway,

then x is likely to be adjacent to water.

Like lakes and the rivers and ocean so

that with certain support and a confidence.

In spatial rail mining, quite often we would like to explore

spatial autocorrelation.

That means spatial data tends to be highly self-correlated,

nearby things are more related than the remote things.

For example when we study neighborhood, study temperature, likely would pay more

attention to find interesting relationship in the nearby objects.

In spatial association mining,

there's a interesting heuristic called progressive refinement.

The general philosophy of this for a spatial relationship, there are some

rough ones like close to which is generalization of some more refined one

like nearby, touch, intersect, contain, they are somewhat or close to.

Just give you an example.

Like here, if we can see, near a highway intersection,

you may be able to find shopping centers and gas stations.

But how close to, whether they are very nearby or

they are almost touch the highway intersection.

So in that sense, those are detail refine relationships.

However, if we first find those close to there frequent together, okay.

Then we, if we want to find the more refine,

how close to are they really to the intersection.

So we can say if the close to is a frequent pattern,

then we are going to study the more refined watch.

To that extent, this is the philosophy of progressive refinement.

That means, we first search for rough relationships, and

then refine for, to study more refined relationships.

The general philosophy is, if the rough relationship is not frequent.

So there's no need to study the very final because they are not frequent as well.

So to that extent, we can do two step mining of spatial association.

The first step, as we use and we shape or low cost algorithm.

Like minimum bounding rectangle or R-trees for some rough pattern mining.

That means we first compute the rough spatial frequent patterns.

Then we know if the rough one is frequent,

then we are going to get into refinement process.

That means we may study a more detailed algorithm

using more refined data structure.

So this principle can save a lot of mining cost.

Because we first using rough lines, we get a big filter.

A lot of unnecessary pairs because they are not frequent already.

We don't have to refine it.

We don't have to use a refine measure to study more refined patterns.

Close to, you can find