Hi. In this lecture we're going to talk about

a really simple model of aggregation.

So, here's the thing I want to model

I want to model a situation where I've got a group of people

-- it could be 100, it could be 1000 --

and each one is independently

going to make a decision to do something.

It could be to, y'know, go to the gym.

It could be to go to the beach.

It could be to go to the grocery store.

What I want to try and understand is that

we've got a whole bunch of people

each one that is making these independent decisions

What's the number of people that shows up?

Now, to characterize that I'm going to use an idea

called the probability distribution.

So, to make this simple, let's suppose that there is

a small group of people, like my family,

which has four people in it.

And I want to know "What's the distribution of number

of four people who go for a walk on a given Saturday?"

Well, if I think about the numbers could be --

there could be 0 people that go, there could be 1,

there could be 2, there could be 3, or it could be

that all 4 of us decide to go for the walk, right?

The dog would prefer if all four of us went,

but, y'know, there's going be some number that goes.

So, I could take -- I could keep track of data.

I could, y'know, chart this on, like, my wall somewhere.

I doubt we could, right? And you can ask,

"What's the likelihood that nobody went for a walk?"

Maybe that's 10%.

Now, what's the likelihood that 1 person went for a walk?

Well that might be 15%.

What about 2 people? That might be 40%.

And what about 3 people? That might also be 15%.

And then, what's the likelihood that 4 of us went for a walk?

That might be, let's say, 20%.

Now the thing to know about a probability distribution is that

each one of these probabilities is less that one, right?

And if we sum them up we get 25 plus 40 is 65

plus 15 is 80; plus 20 is 100.

So we get a total of 100%.

So a probability distribution tells us is

what are the different things that could happen

-- 0,1,2,3 and 4 --

and then it tells us the likelihood of each of those things.

OK, so here's sorta the huge result that we're going to

leverage to understand how things add up.

There's a theorem called the Central Limit Theorem.

And what the Central Limit Theorem tells us is that

if I add up all the whole bunch of individual, independent events

So what does 'independent' mean?

It means my decision to go to the beach

is independent of your decision to go to the beach,

which is independent of your cousin Mary's

decision to go to the beach.

So, by independent, I mean not influenced.

So, I don't care whether you're going to the beach or not.

I'm going to make my decision on my own,

completely independent of what you decide to do

... or your cousin Mary.

So, what the Central Limit Theorem tells us is that

if a whole bunch of people make a whole bunch

of independent decisions, the distribution that we get

has this nice bell-shaped curve.

And this bell-shaped curve means that like

the most likely outcome is the one right in the middle.

So, there's a lot of structure to what happens.

And that means that we can predict a lof ot things.

We can tell a lot about what's going on in the world

And that's what we're going to learn about in this lecture.

It's going to be a lot of fun.

To get an understanding of where these distributions

come from, let's start really simple.

Suppose I flip a coin twice.

And I want to know "What are the odds of getting a head?"

What's the probability distribution over heads.

Well, what could I get?

I could get tails-tails, and that would be 0 heads.

I could get tails-heads, or heads-tails

both of these would be 1 head.

Or I could get heads-heads.

And that would be 2 heads.

So, what's the probability of each of these?

The probability of getting tails-tails is just 1/4.

The probability of getting 1 head is 1/2.

And the probability of getting 2 heads is 1/4.

So, I'm going to get a probability distribution,

if I do it out like this 0, 1, 2

There's a 1/4 chance of that and a 1/2 chance of that

and a 1/4 chance of that

You notice, it sorta looks like a little bell curve.

OK. Let's suppose I flip it 4 times.

Well, it gets harder.

I could think, OK, what are the odds of getting no heads?

I could get tails-tails-tails-tails

well, how do I figure out the probability of that?

Well, 1/2 time 1/2 times 1/2 times 1/2 -- 4 one halves --

that's 2 times 2 times 2 times 2 ... so that's 1/16.

What are the odds of getting one head?

Well, I could get the head first, and then 3 tails...

I could get it second,

I could get it third,

and it could come last.

So, there's four places that it could show up.

So that means there's a 4/16 chance.

Well, I could do all sorts of math again for

what are the odds of getting two heads?

And I'd actually get 6/16.

And 3 heads, well, that's the same as getting 1 head

really, right, because tails and heads are interchangable.

So what I'd get, I'd get this again --

If I drew this distribution out, I'd get a peak at 2 heads, right?

I'm going to get a nice bell curve, right?

So, I'm going to get this thing where

there's very little chance of getting no heads

not that much chance of getting 4 heads

but the most likely thing is getting 2 heads.

So I can count all this stuff and it's fun...

big data, lots of data, and we want to try

But, here is the problem:

Remember we have talk about it. [inaudible]

and understand it.

Often we have more than 2 or 4, we have 'n'

and that is a huge number.

So if we're talking about New York City that can be 10 million people.

If we're talking about Ann Harbour, where I live, that's still like a hundred thousand people.

So I don't want to be sitting there writing tails, tails, tails, tails, tails a hundred thousand times.

I want to have a model that will help me explain it.

So what you can do is if you have n things, the mean, the expected number should be

N over 2, right, should be half of n.

But what we'd like to do is understand sort of what that distribution looks like.

Well, what we know from statistics is that distribution is actually gonna be a nice bell-curve

and the mean, right in the middle of this thing, is gonna be N/2 and this just gonna

sort of flow out nice and symmetrically from each side

Now there's a fancy equation, a formula that tells you what this line looks like.

We're not going to get into that but if you take the Statistics class

which I'd encourage you do - it's a lot of fun - you could learn exactly

what this formula is and how it works, OK?

We just wanna use it as a model for understanding how things aggregate.

So, we're gonna take some leaps ahead in statistics.

Here's the trick though, we gotta be a little bit careful.

Flipping a coin is always equally likely, it's either a head or a tail, each one is 50/50

But if I'm worried about people going to the beach

right, or people going to the supermarket, or people showing up for their flight

that's not a 50/50 proposition, right.

So maybe 90% of people may show up for their flight

and maybe only 10% of people of 15% go out to the beach.

So I'd like to change that 1/2 into something else

Well, I can introduce something called the binomial distribution

where instead of having 1/2, that gives some probability p of doing the thing.

So let's suppose going to the beach happens 15% of the time

Well then, if I had a 1000 people, and p = 15%, then p times N is a 150

so I expect to have a 150 show up.

So that makes sense but then I can ask well, what's the distribution now

I mean 150 is the average but I could have 200, I could have 74

Well again, what the central limit theorem tells us is that we're

gonna get a nice bell-curve, right here you got this nice shape here

[inaudible] with the mean here, which is p times N

well this will be provided if N is big enough, right

but if I can get a pretty large N, you're gonna get this nice bell-curve and the mean is gonna be right at p times N

Okay, there's more of that. Here's where it gets a little bit complicated but also interesting.

There's something called the standard deviation and this is this thing called sigma

which is [inaudible] called the standard deviation. Now, when I draw the normal curve,

there's gonna be a mean, that's this point right here at the center

And then there's gonna be a standard deviation which basically tells us how far spread out

that curve is

And what I mean by that is how far spread out the different outcomes are

So it turns out there's this nice structure to any normal distribution. If you tell me the mean,

and then you tell me the standard deviation,

it's always gonna be the case that 68%

of all outcomes will be between -1 and +1 standard deviation

So, if it's got a big standard deviation

That means that that range could be really wide

If it's got a small standard deviation that means that range should be really tight but

if you tell me the mean and tell me the standard deviation

it's always gonna be the case that 68% of the time I'm between -1 and +1 standard deviation

Now, in fact since that's true for one, it's also gonna be true for 2, true for 3 and true for 4, right

So there's gonna be a 95% chance I'm within 2 standard deviations

So wait, why do we care about this, why do we care about this stuff

Here's why. Now I got this model that says if I add up a bunch of independent events,

here's what the mean is, right.

Now, in a second, I'm going to show you the formula for the standard deviation

so then it'll tell you what sigma is here

Well, if you know the mean and you know sigma,

then I can give you range and I can tell you, you know, that 95% of the time

I'm gonna be between -2 sigma and +2 sigma

So if I said the mean number of people that showed up is a hundred

and that's the mean, right, and the standard deviation is only 2

Well then you'd know 95% of the time, you're gonna be between 96 and 104

So you'd know, okay, I should prepare for pretty much exactly 100 people

If I told you the standard deviation was 15,

then you'd know it can be anywhere between 70 to130

So that's what we want to try and use this model to explain [inaudible]

how wide a range of outcomes we're likely to see in any particular setting

So let's go back to our simple binomial distribution where the probability was 1/2

The mean, remember, is just N over 2

the standard deviation is the square root of N over 2

Well, you can do a little bit of math and show that

So, let's suppose I have N = 100

So if N = 100, that tells me the mean is gonna be 50 so if I flip a coin a hundred times,

guess what, the average is 50, no surprise

But, standard deviation is the square root of N/2

What's the square root of 100, that's 10

So this is 10 over 2 so that gives 5

So what that tells me is if I think in binomial distribution

right, if I draw this thing out

I've got a mean of 50, and then I've got a standard deviation of 5

so that means between 55 and 45 -- 68% of all outcomes

So, if you want, you can do this at home -- it'll take a while -- flip a coin a hundred times

Count how many heads you get. Flip it again, count how many heads again

Do that a whole bunch of times, you'll find that 68% of the time, you get between 45 heads and 55 heads

So what this model gives us is it gives a sense of how strange of outcomes we'll get

So, we know that most of the time, 68% of the time, we'll be between 45 and 55, right

So, our mean is 50, 1 standard deviation is 55 and 45, that means 2 standard deviations is 60 and 40

What that tells us is 95% of the time, you're gonna be between 40 and 60 heads

And, 99% of the time, you're gonna be between 35 and 65

So basically it's, you're almost never gonna throw fewer than 35 heads, and never throw more than 65 heads

And so this is what sort of power that Central Limit Theorem is, right

It gives us a sense of not only the average, but also what the spread will be

Okay, remember this is a simple case. This is the P = 1/2 case

And what we'd like is we want it for the more general case where the probability of something happening can be anything

Right, this is this p over N thing

What turns out here, we're okay because the standard deviation is just p times 1 - p times N then square root the whole thing

So the case where p = 1/2 right, then we have the square root of 1/2 times 1/2 times N

But notice I've got a 1/2 squared here inside so we can just pull that outside so it's just 1/2 the square root of N

so that's where that square root of N over 2 came from

So now, for the binomial distribution, I've got this clean formula as well

And we can use that to model and understand stuff that's a little bit more interesting than just flipping a coin

Let's a real example, let's have some fun

So, how, most of us have probably been bumped off a plane before

You show up at the airport and there's like too many people showed up for the plane

And you think why did they do this, but the reason that they sometimes have to [inaudible] is they oversell

And the reason they oversell tickets is because not everybody shows up

So if you're running an airline and you've got 400 seats, and you know people show up, you know, 90% of the time

You want to sell more than those 400 seats, right, so that your plane is pretty much full

So let's do an example. Let's suppose, make it simple, that our plane [inaudible] got 380 seats

So let's suppose we got a Boeing 747 with 380 seats

Let's suppose that 90% of the time, people show up

So we've gathered, we run an airline, we've gathered lots of data

We pretty much know 90% of the time, people show up and that it's independent

So one person's decision to show up doesn't [inaudible] have anything to do with anybody else's

Now, that might not be true, right. Because if it's snowy, if I'm late, you're likely to be late

But let's just suppose that these things are independent. And let's suppose that we sell 400 tickets

Now we're trying to get some understanding why, what is that mean

What's the likelihood that if we sell 400, that we're going to have more than 380 people show up

Here's where the model can help us. It'll be able to tell us what the mean is, it will also tell us what the standard deviation is

So the mean, right if I sell 400 tickets, and on average 90% of people show up, that means I should sell on average 360 tickets

That's less than 380 seats but it should be fine, but what I care about is more than 380 people show up

'cause they're gonna be like, I paid for this to go to Florida, I want to go to Florida, I don't want to be bumped

So more than 380 show up, guess what, they're gonna be mad, right

So the 360 doesn't tell us enough, we want to know something about the distribution

Okay, well look, we've got a formula, right, remember

So N was 400, and p was .9 so p times N is 360, that's our mean

Now, the standard deviation we can solve for pretty easily. That's just the square root

of p, which is .9 times 1-p, which is .1, times N which is 400

So if we multiply right out that's .9 times .1 times 400

.1 times 400 is 40, times .9 is 36, that gives the squared of 36, which is 6

So 6, is our standard deviation. Now I get a bell-curve with a mean of 360 and a standard deviation of 6

Well, that's useful, that can help us 'cause let's go back and let's look

That means our mean's 360, our standard deviation is 6, so that means 68% of the time, we're gonna between 354 and 366. That's great.

It means that 95% of the time, we'll between 348 and 372, also great

It means 99.75% of the time, we'll be between 378 and 342.

Well, how many seats do we have, we have 380 seats, so this means that 99.75 -- actually more than that, right

More than 99.75% of the time we won't overbook.

So here's the Central Limit Theorem, let's, let's say it formally.

Central Limit Theorem [inaudible] is the following. We got a whole bunch of random variables

so those could be decisions to show up to a flight or not so in most case the random variables are just 1s and 0s

Or they could be, you know, the weight of your bag. Each person's weight of their bag is [inaudible] independent variable.

As long as those things are independent, so that means it, each person's decision doesn't depend on somebody else's or how much stuff I jam in my bag doesn't affect how much stuff you jam in your bag

And that those things have finite variance -- what does that mean -- that means that they're bounded

So we know we can't have super huge values, like so my bag couldn't weigh billions and billions of pounds

So long as there's sort of you know, the possible range of [inaudible] that each one can take is bounded in some way

Or doesn't with some high probability take huge, huge values then when you add those things up

When you sum them up, you're gonna get a normal distribution which means a bell-curve, which means we can predict stuff

We can use that model [inaudible] make sense of how the world works

Now, let's step back for just a second and think about like, why this is so cool

Suppose it weren't true, here's a little thought experiment

Suppose it were the case that when I added up a bunch of independent events

Most of the time, I get something nice, then there were some spiky probability of some huge event

over here. What would this mean. Well, this would mean like sometimes

you go to the grocery store, and there would be like, 1000 people there

or sometimes you'd be like I'm just gonna run to the bathroom and there'd be 300 people in line, right

A lot of the predictability of the world, a lot of the predictability of these sort of daily comings and goings

stem from the fact that this can't happen and that we get these nice bell-curves

Because if individual people, individual firms, individual groups of people

make decisions that don't depend on what other people decide

[inaudible] independent decisions, then what you're gonna get is

you're gonna get sort of nice, regular stuff, according to a bell-curve

Yeah, sure there'll be traffic jams, sure there'll be a lot of people at the mall

There will be days where you get a lot coming on, and there'll be days where nothing much is going on

But most of the time, you're gonna get things in that little region which is gonna be predictable and understandable

Now, is everything normally distributed? No, it's not.

What about stock returns? If you look at stock returns, you'll actually see that

there's far too many days where really nothing happens, and there's far too many days where there's huge gains

and far too many days where there's huge losses

And what's going on there is this is that the actions are no longer independent

For example, prices are going up, a lot of people may buy and that's going to cause prices to go up even further

And if prices start to fall, people may sell and that can cause prices to fall even further

So when events fail to become independent, fail to satisfy independence's assumption,

then we can get more big events than we'd expected and more small events that we'd expected

So let's put a [inaudible] on this, what have we got

If we use the Central Limit Theorem as a model, and we use this model to explain

how if we add up a bunch of independent events, then what we get is

we get a nice normal distribution, right

And we can understand the mean, we can understand the standard deviation

We can use that to predict how likely things are to occur, right

We also learn that like, it's that independence that gives us that normality, right

Without independence, we could get really big events, really small events

We can get all sorts of strange stuff happening

So where we're gonna go next, I'm gonna get take, there's a brief lecture on something called

the Six Sigma that pushes this idea sort of, the predictability of the system a little bit

further than we had before. But then after that, we're gonna start

y'know, we're [inaudible]

having systems where there's interdependent actions

and we have those interdependent actions, we're no longer going to get these sort of nice bell-curves

We're going to get all sorts of really interesting, strange stuff

It's going to be a lot of fun. Alright, thank you.