这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

Pattern Discovery in Data Mining

163 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Module 3

Module 3 consists of two lessons: Lessons 5 and 6. In Lesson 5, we discuss mining sequential patterns. We will learn several popular and efficient sequential pattern mining methods, including an Apriori-based sequential pattern mining method, GSP; a vertical data format-based sequential pattern method, SPADE; and a pattern-growth-based sequential pattern mining method, PrefixSpan. We will also learn how to directly mine closed sequential patterns. In Lesson 6, we will study concepts and methods for mining spatiotemporal and trajectory patterns as one kind of pattern mining applications. We will introduce a few popular kinds of patterns and their mining methods, including mining spatial associations, mining spatial colocation patterns, mining and aggregating patterns over multiple trajectories, mining semantics-rich movement patterns, and mining periodic movement patterns.

- Jiawei HanAbel Bliss Professor

Department of Computer Science

[SOUND].

Now I am going to introduce you another algorithm called SPADE,

which is sequential pattern mining based on vertical data format.

You probably still remember the vertical data format based frequent pattern mining

algorithm called ECLAD.

Here for the same set of authors,

they actually develop an interesting algorithm for sequential pattern mining.

Okay.

The idea is pretty simple.

If you take the Sequence, you do a little detailed study,

you get a Sequence ID, Element ID, and set of Items.

So, what you can see is, for the first Sequence ID, 1.

Element ID 1, you find item a.

Element ID 2, you find a, b, c so on, okay.

Then we can transform this into vertical format.

That means we just look at the, where a occurs and where the b occurs.

So the a occurs, you probably can see, it happens in Sequence 1, ElementID 1.

In Sequence 1, Element 2, and also Sequence 1, Element 3.

So you get this one the same.

You can get Sequence ID 2, 3, 4.

Similarly for b, you can find where the b comes, is Sequence 1, Element 2.

So you get 1, 2.

Then how we can combine them into frequent sequences, like a then b, or b then a.

If you say a then b, you, you will be requiring a is in front of b,

or a's Element ID is in front b's Element ID.

That means for the same Sequence ID 1, if EID 1 is

smaller than EID 2 of b, then you get 1 Sequence E ab.

Similarly, for b then a, what you get is these

Element ID of b should be smaller than Element ID of a.

For the same reason you can get all of them for the lines 2, okay.

So for lines 3 what you need is,

you just get lines 2 frequent 1s.

Then you do drawing.

How do you do drawing?

You probably can see these Element ID Sequence ID should be the same.

The Element ID, this b shared with this b, they are both 2.

So you can join them together, you get 1, 2, 3 here.

And you join the other together, you can get a 1, 3, 4 because it,

Element 2, you get a, you get a b, you can get a again.

So, the two lines and you actually can find all of them, okay.

So that's a reason you can use Apriori based principle

to find all the frequent subsequences.

This algorithm was developed by Zaki in 2001 called SPADE.

It's Sequential Pattern Discovery using Equivalent Class.

[MUSIC]