这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

来自 伊利诺伊大学香槟分校 的课程

Pattern Discovery in Data Mining

119 评分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

从本节课中

Module 1

Module 1 consists of two lessons. Lesson 1 covers the general concepts of pattern discovery. This includes the basic concepts of frequent patterns, closed patterns, max-patterns, and association rules. Lesson 2 covers three major approaches for mining frequent patterns. We will learn the downward closure (or Apriori) property of frequent patterns and three major categories of methods for mining frequent patterns: the Apriori algorithm, the method that explores vertical data format, and the pattern-growth approach. We will also discuss how to directly mine the set of closed patterns.

- Jiawei HanAbel Bliss Professor

Department of Computer Science

[SOUND]

We are going to introduce a very important

property of frequent patterns,

which is called the Downward Closure Property of Frequent Patterns.

Let's look at this simple transaction database TDB 1,

it contains only two transactions, T sub 1 and T sub 2.

Suppose we get frequent itemsets a1 to a50,

then we actually can clearly see all its subsets like a1, a2,

or a1, a2 as a item set, they're all frequent.

Then you may wonder, there must be some interesting hidden

relationships among different frequent itemsets.

Actually there is a one called downward closure property

of frequent patterns which is also called the Apriori property.

Okay, now let's look at this.

Suppose we know {beer, diaper, nuts}, this itemset is frequent.

Obviously, beer and diapers should be frequent as well,

because any transaction which contains beer, diaper, and

nuts must also contain beer and diaper as a itemset.

That's why the beer, diaper as an itemset should be at least as

frequent as beer, diaper, and nuts.

So we can easily derive this property so

that any subset of a frequent itemset must be frequent,

if we keep the minimum support ratio as the same.

So in that context, we can derive a efficient mining methodology.

The general philosophy is, if you find an itemset S,

any of its subset is infrequent.

Then, there's no chance for S to become frequent because based on this Apriori

property, then we do not even have to consider to mine S.

This actually turns out to be a sharp knife for pruning.

So, this Apriori Pruning, based on this,

it generates quite a lot of Scalable Pattern Mining Methods.

So the first Apriori Pruning principle was discovered

by Rakesh Agrawal and Srikant in VLDB 1994.

[INAUDIBLE] Mannila in KDD'94 workshop also generates a similar methodology.

The methodology generally says if there's any subset,

any itemset which is infrequent,

then its superset should not even being considered and not even being generated.

Based on this, there are three major approaches develop in subsequent studies.

One essentially is Apriori, the first representative work was

published in VLDB 1994 called level-wise join-based approach.

Another method was developed by Zaki and [INAUDIBLE] and

what they got called Eclat is based on vertical data format.

Then the cert approach is essentially pattern based.

It's frequent pattern projection growth is pattern growth approach.

Called api growth, developed by us in year 2000.

[MUSIC]