这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 University of Illinois at Urbana-Champaign 的课程

Pattern Discovery in Data Mining

134 个评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Module 1

Module 1 consists of two lessons. Lesson 1 covers the general concepts of pattern discovery. This includes the basic concepts of frequent patterns, closed patterns, max-patterns, and association rules. Lesson 2 covers three major approaches for mining frequent patterns. We will learn the downward closure (or Apriori) property of frequent patterns and three major categories of methods for mining frequent patterns: the Apriori algorithm, the method that explores vertical data format, and the pattern-growth approach. We will also discuss how to directly mine the set of closed patterns.

- Jiawei HanAbel Bliss Professor

Department of Computer Science

Hi! In this final session of this lecture we're going to discuss mining closed pattern.

As we already know before,

closed pattern is a compact form but it's a last less compression of frequent patterns.

So mining this closed itemsets,

it's very interesting and useful.

And with pattern-growth approach there's

one interesting method developed called 'Closet+'.

Let's look at this 'Closet+'.

How to develop efficient directly mining of closed itemsets.

So let's look at this transaction database,

it contains only four transactions and these are the items in these transactions.

Suppose the minimum support is two,

we'll be able to get these as frequent itemsets.

And based on this we can work out the F-list like the following.

Now, we look at an interesting method developed called 'itemset merging'.

The philosophy can be represented using this example.

Let's look at these projective database.

For these projective database,

we will have ACF,

EF, and ACF based on this.

As you can see this project database will get ACEF and ACF.

But the interesting thing is ACF happens in every transaction project in this database.

ACF have the same support as D. In that case,

we can grab ACF out form

ACFD project database which contains only one item E, it is not frequent.

Therefore, we will be able to get a ACFD support is two, it's the final result.

This method called 'itemset merging' simply says if Y appears in every occurrence of X,

then items of Y is merged with X.

Now, the X is D and a Y is ACF.

ECF occurs in every occurrence of X which is D,

then we will merge ACFD together to form a more compressed form.

That means, you can mine all these immediately.

So this is more efficient.

Actually, there are many tricks developed in Closet+.

For example, hybrid tree projection,

we use bottom-up physical tree projection,

top-down pseudo tree projection.

There's one technical sub-itemset pruning,

itemset skipping, efficient subset testing.

But I'm not going to get into the details.

For details, you can read this paper.

So finally I'll summarize the recommended readings.

These are all classical papers.

Apriori mining and the further improvement of Apriori mining.

Then we have vertical methods,

FP-growth methods, and we have Closet+ methods.

So finally, there is

an interesting survey article called 'Frequent Pattern Mining Algorithms',

which contain many more algorithms covered in this lecture.

If you're interested in,

go ahead and read this chapter.