So previously we talked about expected values, and their properties, and a little

bit of how to calculate them. Now let's talk about the expected value

operator itself, and the expected value operator is in fact a linear operator, and

that'll greatly simplify calculating expected values for some relatively

complicated things. So, loomis in the'A' and'B' are not random

numbers. So, when you think about'A' and'B', you

think'A' as some thing like five, it's this number you can plug in.

And'x' and'Y' are two random variables. Then expective value of'AX plus B' works

out to be'A' expective value of'X' plus'B', exactly what you hope it work out

to be. And expective value of'X' plus'Y' works

out to be expective value of'X' and expective value of'Y'.

And the reason it works out in these cases is because the expected is a linear

operator, it is not always the case that expected value of g of x is equal to g of

expected value of x, where g is some general function that's not linear.

This can happen with specific random variables and specific values of g, but in

general it's not the case. The most, sort of, famous example where

it's not the case is that expected value of x squared is not equal to expected

value of x, whole thing quantity squared. Now, let's talk about what the difference

of these two entities is. What do we mean by the difference between

these two things? Here x is a random variable.

X-squared is the random variable you obtain by squaring x.

So, for example, if x is a die roll, it can take values one, two, three, four,

five, six. X-squared then can take the values one,

four, nine, and it takes those values with probability one-sixth each.

So the expected value of X squared represents the expected value of the

squared random variable. On the other hand, expected value of X

quantity squared, represents what you obtain if you first calculate the expected

value of X, and then square the result. And these two things are not equal.

This is a, well-known example where expected value of g of x is not equal to g

of expected value of x. And we'll see in a couple slides why it is

a well-known example of that property. But in general, I would like you to

remember that if g is not a linear function then you can't just commute

expected values outside of g to the inside of g.

And that, that rule would generally hold. If it's, if it's linear, if g is a linear

function you can always do it. If g is not a linear, just in general

things that you cannot. The expected value rule.

Hold, no matter what. Constitutes X and Y.

X could be discrete, continuous, mixed discrete and continuous.

Y could be discrete, continuous, and mixed and the rules still holds.

So, let me go through an example, supposed you flip a coin X, and as we normally do,

X is zero is it's tails and one if it's head and you simulate a uniform random

number Y. The random number is between zero and one.

What's the expected value of their sum. Well the sum of a coin flip and a uniform

random variable is weird distribution. It's not obvious, especially if all you've

had is the handful of lectures from this class, how you would calculate that

distribution, and then from that distribution then calculate the expected

value. However, we do know how to calculate the

expected value of a uniform random variable and the expected value of a coin

flip, and so the expected value of their sum is the sum of their expected values.

We know the expected value of the coin flip is.5.

We know that the expected value of the uniform random variable is.5, so that

expected value is one. So you can see how these.

Expected value operator rules make calculating things associated with

expected values a lot easier. Another example is, suppose you role a die

twice. What is the expected value of the average

of two die roles. So you often roll two dies when you're

playing a board game, for example. Okay, let's let x1 be the result of the

first die and x2 be the result of the second die.

Now the variable that we're interested in, let's call it y equal to x1 plus x2,

divided by two. Now, one way you could calculate the

expected value of y is to figure out what the distribution was of the average of two

die rolls. So, let me give you a sense of this really

quickly. The reason we think the distribution of a

single die roll is one sixth at each number, is if you roll a die a lot of

times you get about, one sixth of the, of the die rolls are one, one sixth are two,

one sixth are three, one sixth are four, and so on.

And, and, and then kind of geometrically we are modelling the process as if they're

all equally likely, and so that's why we're going to model the population of die

rolls as having probability one sixth on each number.

Now this implies a distribution on the average of two die rolls, right?

That the smallest number it could take is one, right?

One plus one divided by two, this, the average of if you were to get two 1s.

And the largest it could take is, is six plus six divided by two or six if you were

to roll two 6s. But it takes different values in between,

and it, and it's not equally likely. For all the, all the numbers in between.

A one has probability 136, but some of the middle values have higher probabilities.

So this, any rate, our variable y itself has a distribution.

And you could get a pretty good sense of it.

Maybe you could do this by taking two dice, rolling them, taking the average,

rolling them again, taking the average, doing that over and over and over again,

and prob, plotting, you know, a bar plot of the frequency of the, the averages that

you get. And that would give you a good sense of

what the population distribution is. Or you could work it out on pen and paper

as to what, what the distribution actually is.

And then once you get that worked out, then you could use your expected value

formula to calculate the expected value of y directly by doing summation overall the

possible values of y times p of y and calculate its expected value.

Another way to do it is to directly use the expected value of linear operator

rule. So in this case, expected value of x1 plus

x2 divided by two is one half expected value of x1 plus the expected value of x2

because the one half is the non random variable that we could just pull out and

the expected value goes across the two sums here to get expected value of x1 plus

expected value of x2. That then yields 3.5 plus 3.5 divided by

two, which is 3.5. Now you might be wanting, wondering.

After hearing this it's, "oh, that's interesting." You'd expect a value of the

average of two die rolls is the exact same as the expected value of an individual die

roll. And that is exactly the case, but you're

probably thinking, "Maybe does this extend beyond that.

Is the expected value of the average of N die rolls equal to the 3.5 as well." And

the, the answer is yes, that is, that's exactly true.

In, in fact, as a nice segue way into our next slide.

Where we actually derive. The property that we were hinting at in

the previous slide. Namely that the expected value of the

average of a collection of random variables from the same distribution, is

the same as the, the expected value of the individual random variables.

So lets let XI, for I equal one to N, be a collection of random variables, each.

Each from a distribution would mean mu. I just wanna also point out that we tend

to use Greek letters to represent, population quantities, in this case the

population mean of the distribution is mu. So lets calculate the expected value of

the sample average of the XI. Well, we want the expected value of the

sample average which is one over N, summation I equals one to the N to the

XI's. The one over N pulls out because its not

random. The expected value commutes across the

sum. And the expected value of each of those x

I's, is itself mu. So we get the summation I equals one to n

of mu. We get mu added up n times then which is n

mu divided by n on the outside, so we get mu.

So what this says is, it doesn't matter what the distribution of the individual

x's is The distribution of the mean of the Xes, has the same mean as the, the,

individual means. So let me just summarize one more time.

The expected value of the sample mean, is the population mean that it's trying to

estimate. The population mean of the distribution of

the sample mean of N observations is exactly the population mean that it's

trying to estimate. And so when this happens.

When the expected value of an estimator is what it's trying to estimate, that's a

good thing. We say that the estimator is itself

unbiased. So sample means are unbiased estimators of

population means. And again, there were some assumptions for

this to be true, right? All the axis have to be from a

distribution that has mean . U being the value you want estimate and

then the. The sample mean is, is an unbiased

estimator of the population mean and we finally getting to the point where we can

talk about how we're going to connect our probability modeling to the data that we

observed. We're not quite there yet but we're

getting closer and closer to this and I want you to remember that we're throwing

around the term mean a lot. And I want, and so if you get confused, I

want you to qualify the mean that we're talking about, whether it's a population

quantity. By component of the probability

distribution, or a sample quantity, an empirical quantity that you connect from

the data. And remember our goal in probability

modeling is to connect our sample observations to the population using our

probability model.