So step number 1 is sorting the data.

So I've already done this for us in the next slide.

So you'll notice that we have the same number of at-bats,

is just I've taken them and ordered them

from lowest to highest.

What we're trying to do with quantiles is figure out,

again, where the cutoff is,

that X percent of our data lies in a given range.

So for example, I could say, "Okay,

I want to create five different quantiles."

So the user decides the number of quantiles.

So let's say we want to have five quantiles.

What this means is I'm going to

calculate the probability at 20 percent,

40 percent, 60 percent,

80 percent and 100 percent.

So quantile really means the cutoff point at

which 20 percent of the samples are less than this value.

So how many is 20 percent of the samples?

Well, I have first need to figure out

how many samples I have in my dataset.

So all that really is just a counting problem.

I just need to count how many baseball players

I put in my dataset here.

So I have 1, 2, 3, 4, 5,

6, 7, 8, 9, 10, 11,

12, 13, 14, 15,

16, 17, 18, 19.

We have 20 baseball players in our dataset. All right.

So if I have 20 baseball players

and I want five quantiles,

so 20 divided by 5 is four.

So since I've sorted the data,

and this is position number 4,

then this is where

my cutoff value from my quantile is, so 514.

So I want to guarantee that four of my samples,

so four samples from my dataset,

are going to have a at-bat

less than this number of at-bats.

So since it's 514 and since this number was an integer,

I really want to take the average

between sample four and sample five,

because at 514 I have an equal to, not a less than.

So really I can think about

my 20th percentile here would be something

like 515 or 516.

So 515, I can guarantee that

four samples are less than 515.

At the 40 percent,

so this was my quantile one.

At my quantile two,

I have to take two times my number of samples.

So 2 times 20 over the number of quantiles I want.

So now this is going to be eight.

So I go to my eighth position five, six, seven, eight.

This is my cutoff there.

So again, I can take the average

between these two numbers,

and I could get something like 550.

So I can guarantee that eight samples in

my dataset are less than 550.

So to continue for 60 percent,

it's the quantile number.

So the quantile number times the number of

samples divided by the number of quantiles you have.

So for the 60th percentile,

this is our third quantile out of

five times 20 samples divided by,

we're going to have five quantiles,

and so we wind up with 12.