0:00

One of the problems of Object Detection as you've learned about this so far,

is that your algorithm may find multiple detections of the same objects.

Rather than detecting an object just once,

it might detect it multiple times.

Non-max suppression is a way for you to make

sure that your algorithm detects each object only once.

Let's go through an example.

Let's say you want to detect pedestrians,

cars, and motorcycles in this image.

You might place a grid over this,

and this is a 19 by 19 grid.

Now, while technically this car has just one midpoint,

so it should be assigned just one grid cell.

And the car on the left also has just one midpoint,

so technically only one of those grid cells should predict that there is a car.

In practice, you're running

an object classification and localization algorithm for every one of these split cells.

So it's quite possible that

this split cell might think that the center of a car is in it,

and so might this,

and so might this, and for the car on the left as well.

Maybe not only this box,

if this is a test image you've seen before,

not only that box might decide things that's on the car,

maybe this box, and this box and maybe others as

well will also think that they've found the car.

Let's step through an example of how non-max suppression will work.

So, because you're running

the image classification and localization algorithm on every grid cell,

on 361 grid cells,

it's possible that many of them will raise their hand and say,

"My Pc, my chance of thinking I have an object in it is large."

Rather than just having two of the grid cells out of the

19 squared or 361 think they have detected an object.

So, when you run your algorithm,

you might end up with multiple detections of each object.

So, what non-max suppression does,

is it cleans up these detections.

So they end up with just one detection per car,

rather than multiple detections per car.

So concretely, what it does,

is it first looks at the probabilities associated with each of these detections.

Canada Pcs, although there are

some details you'll learn about in this week's problem exercises,

is actually Pc times C1,

or C2, or C3.

But for now, let's just say is Pc with the probability of a detection.

And it first takes the largest one,

which in this case is 0.9 and says,

"That's my most confident detection,

so let's highlight that and just say I found the car there."

Having done that the non-max suppression part then looks at all of

the remaining rectangles and all the ones with a high overlap,

with a high IOU,

with this one that you've just output will get suppressed.

So those two rectangles with the 0.6 and the 0.7.

Both of those overlap a lot with the light blue rectangle.

So those, you are going to suppress

and darken them to show that they are being suppressed.

Next, you then go through the remaining rectangles

and find the one with the highest probability,

the highest Pc, which in this case is this one with 0.8.

So let's commit to that and just say,

"Oh, I've detected a car there."

And then, the non-max suppression part is to

then get rid of any other ones with a high IOU.

So now, every rectangle has been either highlighted or darkened.

And if you just get rid of the darkened rectangles,

you are left with just the highlighted ones,

and these are your two final predictions.

So, this is non-max suppression.

And non-max means that you're going to output

your maximal probabilities classifications

but suppress the close-by ones that are non-maximal.

Hence the name, non-max suppression.

Let's go through the details of the algorithm.

First, on this 19 by 19 grid,

you're going to get a 19 by 19 by eight output volume.

Although, for this example,

I'm going to simplify it to say that you only doing car detection.

So, let me get rid of the C1, C2,

C3, and pretend for this line,

that each output for each of the 19 by 19,

so for each of the 361,

which is 19 squared,

for each of the 361 positions,

you get an output prediction of the following.

Which is the chance there's an object,

and then the bounding box.

And if you have only one object,

there's no C1, C2, C3 prediction.

The details of what happens,

you have multiple objects,

I'll leave to the programming exercise,

which you'll work on towards the end of this week.

Now, to intimate non-max suppression,

the first thing you can do is discard all the boxes,

discard all the predictions of the bounding boxes with

Pc less than or equal to some threshold, let's say 0.6.

So we're going to say that unless you think there's at least a

0.6 chance it is an object there, let's just get rid of it.

This has caused all of the low probability output boxes.

The way to think about this is for each of the 361 positions,

you output a bounding box together

with a probability of that bounding box being a good one.

So we're just going to discard

all the bounding boxes that were assigned a low probability.

Next, while there are

any remaining bounding boxes that you've not yet discarded or processed,

you're going to repeatedly pick the box with the highest probability,

with the highest Pc,

and then output that as a prediction.

So this is a process on a previous slide of taking one of the bounding boxes,

and making it lighter in color.

So you commit to outputting that as a prediction for that there is a car there.

Next, you then discard any remaining box.

Any box that you have not output as a prediction,

and that was not previously discarded.

So discard any remaining box with a high overlap,

with a high IOU,

with the box that you just output in the previous step.

This second step in the while loop was when on the previous slide you would

darken any remaining bounding box that had

a high overlap with the bounding box that we just made lighter,

that we just highlighted.

And so, you keep doing this while there's

still any remaining boxes that you've not yet processed,

until you've taken each of the boxes and either output it as a prediction,

or discarded it as having too high an overlap,

or too high an IOU,

with one of the boxes that you have just output as

your predicted position for one of the detected objects.