This slide shows what are the settings under which you should maybe use

anomaly detection versus when supervised learning might be more fruitful.

If you have a problem with a very small number of positive examples, and

remember the examples of y equals one are the anomaly examples.

Then you might consider using an anomaly detection algorithm instead.

So, having 0 to 20, it may be up to 50 positive examples,

might be pretty typical.

And usually we have such a small positive, set of positive examples, we're going to

save the positive examples just for the cross validation set in the test set.

And in contrast, in a typical normal anomaly detection setting,

we will often have a relatively large number of negative examples

of the normal examples of normal aircraft engines.

And we can then use this very large number of negative examples With

which to fit the model p(x).

And so there's this idea that in many anomaly detection applications,

you have very few positive examples and lots of negative examples.

And when we're doing the process of estimating p(x),

affecting all those Gaussian parameters, we need only negative examples to do that.

So if you have a lot negative data, we can still fit p(x) pretty well.

In contrast, for supervised learning, more typically we would have a reasonably

large number of both positive and negative examples.

And so this is one way to look at your problem and

decide if you should use an anomaly detection algorithm or a supervised.

Here's another way that people often think about anomaly detection.

So for anomaly detection applications,

often there are very different types of anomalies.

So think about so many different ways for go wrong.

There are so many things that could go wrong that could the aircraft engine.

And so if that's the case, and if you have a pretty small set of positive examples,

then it can be hard for an algorithm, difficult for an algorithm

to learn from your small set of positive examples what the anomalies look like.

And in particular,

you know future anomalies may look nothing like the ones you've seen so far.

So maybe in your set of positive examples, maybe you've seen 5 or 10 or

20 different ways that an aircraft engine could go wrong.

But maybe tomorrow, you need to detect a totally new set,

a totally new type of anomaly.

A totally new way for

an aircraft engine to be broken, that you've just never seen before.

And if that's the case,

it might be more promising to just model the negative examples with this sort of

calcium model p of x instead of try to hard to model the positive examples.

Because tomorrow's anomaly may be nothing like the ones you've seen so far.