In this case, we're going to include a lot of other solid field boxes in red or

pink, right?

So in this case now, we're going to notice that among the four neighbors,

there are three neighbors in a different category.

So if we take a vote,

then we'll conclude the object is actually of a different category.

So this both illustrates how can nearest neighbor works and

also it illustrates some potential problems of this classifier.

Basically, the results might depend on the K and indeed,

k's an important parameter to optimize.

Now, you can intuitively imagine if we have a lot of neighbors

around this object, and then we'd be okay because we have

a lot of neighbors who will help us decide the categories.

But if we have only a few, then the decision may not be reliable.

So on the one hand, we want to find more neighbor, right?

And then we have more votes.

But on the other hand, as we try to find more neighbors we actually could risk

on getting neighbors that are not really similar to this instance.

They might actually be far away as you try to get more neighbors.

So although you get more neighbors but those neighbors aren't necessarily so

helpful because they are not very similar to the object.

So the parameter still has to be set empirically.

And typically, you can optimize such a parameter by using cross validation.

Basically, you're going to separate your training data into two parts and

then you're going to use one part to actually help you choose

the parameter k here or some other parameters in other class files.

And then you're going to assume this number that works well on your

training that will be actually be the best for your future data.