In this module we're going to discuss a varied topic of problem.

Namely, how should average returns be computed?

We'll see why this is an important question and why there can be different

answers to this question. We'll also see what answer is more

appropriate to investors. Here's an example.

Suppose an investment fund delivers the following performance.

In year 1, they returned 20%. In year 2, they returned minus 10%.

What is the average annual return of the fund?

Well, it's going to be 20% minus 10% divided by 2, which is equal to 5%.

So we can say the average annual return is 5% here.

But suppose I change things just a little bit.

Actually, I won't change anything, I'm going to give you a little bit more

information. Consider this example.

The exact same fund, the exact same performances in years 1 and 2, plus 20%

and minus 10%. But now I also tell you what number of

dollars were invested in the fund. In year 1 there was 1 million dollars

invested in the fund. In year 2, there was 10 million dollars

invested in the fund. Now I'm going to ask you the same

question. What is the average annual return of the

fund again? Well, it's not clear any longer because

there's two possibilities. We could say it's 5% as before where we

just take the average of 20 and minus 10. Or we could choose to compute a dollar

weighted average return. If you look at this, you can see there's 1

million dollars, which received an average return of 20%.

And there's another 10 million dollars which received an average return of minus

10%. So if I look at the average return to each

dollar, then this is the correct answer. It's 1 million times 20% minus 10 million

times 10% divided by total of 11 million and I get minus 7.27%.

So in this case the average annual return is actually much smaller.

So we can see 5%, or minus 7.27%. And the question is which return is more

compelling if any? Why is this important?

Well it is important because investors care about returns to their dollars.

And so in fact you could argue that at the aggregate level, investors should be

caring much more about a dollar weighted return, in which case this number is more

significant. And so to emphasize this claim, consider

the following two situations. If you're an aggregate investor, in other

words if you take all investors together and you asked them which would they

prefer, would they prefer this situation? Let's call this situation 1, or situation

2. The difference between situation 1 and

situation 2, is that in situation 1, 1 million dollars was invested in year 1 and

that earned 20%. And $10 million was invested in year 2,

and that earned minus 10%. Or the reverse of that is situation 2.

10 million dollars in year 1, earning 20%. And one million dollars in year 2, losing

10%. While I think investors in aggregate would

per, far prefer to be investing in this situation here, because this is what will

happen to their dollars. Investors care about dollars invested,

what's going to happen to their dollars. They don't necessarily care about average

annual return of 5%. If in years where the returns were very

high, they didn't have any dollars invested.

And years in which returns were very low they had lots of dollars invested.

What they care about is the return on their dollar.

Here's another reason why investors should care about the total number of dollars

invested. In financial markets, expected returns

often decrease as the dollars invested increase.

This is because the liquidity of a market, or the so called capacity of a trading

strategy is not unbounded. Now this isn't always obvious to the small

investor who only invests in liquid markets and therefore does not move the

market. So, what I'm getting at here is a small

investor might buy some shares in an S&P 500 ETF.

Or maybe they buy some foreign exchange. Those markets are extremely liquid, so a

small investor trading in those markets is not going to move the markets.

In other words, the act of their trading is not going to have an impact on the

market price of those securities. This is not true in general for large

investors. The larger they are, the more they tend to

move a market. The more liquid the market, the more they

move it. And in this case the cost per security

increases with the number of securities they buy.

And the cost per security decreases with the number of securities they sell.

So this implies that returns decrease on average as dollars invested increases.

Let me give you an example. A simple example which is, might be a

gambling example. Suppose we've got 2 teams.

We've got team A, and team B. Let's suppose, that the odds of team A

beating team B are 50%, and the odds of team B beating team A are 50%.

And let's suppose that the market agrees on these odds, maybe you're going to Vegas

and you want to bet on team A versus team B, you see these odds in the casino.

You however think that the probability that team A will win is 75% and the team B

will win is 25%. So in this situation you'd like to bet on

team A. But you won't be able to bet an unlimited

amount. Maybe it's not Vegas, may your friend is

giving you these odds. So your friend is giving you these odds of

50% and 50%, but they'll tell you, sure you can bet but I'm not going to accept a

bet of more than $10. Well in that case the most you can bet is

$10. And so in this case it's a very ill liquid

market. There's not much capacity in the market,

the capacity is $10, after which there's no ability to trade anymore.

So believe it or not, financial markets behave like that as well.

The more you trade in some of these markets, especially for big investors, the

more you move the market against you. And so what happens is you tend to see

decreasing returns to dollars invested. Now the question of how to compute average

returns is important. Depending on how you answer it, certain

types of investing can seem more or much less attractive.

An example of this is the hedge fund industry.

On aggregate, they would prefer to report average returns over time.

And in fact they do so. Now that's not to say the hedge funds are

being dishonest, they're certainly not. One can just view it as being good

marketing. Every industry markets and the hedge fund

industry is no different. So if they wish to report their returns as

being average returns over time, then that's fair enough.

However, we as investors should be aware of this and be aware that from our

perspective we care more about average net returns per dollar invested.

So if we measure returns this way we might get a far different average return than

that reported by say the hedge fund industry.

And it's important to be aware of this, because there are very different,

different ways of computing returns, and you get very different answers depending

on how you compute them. This has actually caused some controversy

and debate. There are some financial blogs out there

that discuss this topic. A nice blog and a nice discussion of this

topic can be found at this URL here. And I'll encourage you to take a look at

it and read this discussion. Here's another problem with averages.

It's not a financial example but it is a nice example because it demonstrates how

people can be easily confused by the way a question is worded.

Sometimes the confusion becomes very apparent once it's explained, but in

everyday conversation, sometimes this, these issues go by us we don't really

notice we're calculating the wrong quantity.

So here's a question. Suppose I wish to estimate the average

number of children per family in the US. And to compute an estimate I do the

following. I sample n people randomly, maybe n will

be a very large number. Maybe it's a 1000 or 10,000 or 50,000.

And for the i th person I determine x i, which is the number of siblings in his or

her family. My estimate, c hat say, is then given by

the following. C hat is going be some of the XI's plus 1.

So, this extra 1 is for the person that I sampled.

So, the number of children in that family will be the number of siblings plus the

person I sampled. So that's x i plus 1.and then I divide by

n. So, that's my estimate of the average

number of children per family in the US. Now let's ignore any minor problems that

you might see with this sket sampling. There's a bigger question here.

And the bigger question is does the sampling scheme have a fundamental

problem? If so, in what way will c hat be biased?

And how does this problem compare to the average return problem?

So these are some other questions we are interested in as well.

To explain to you why there's a problem with c hat, consider the following

situation. Let's assume there's a universe of 5

families. So this is family.

We've got family number 1. This is the number of kids, or children,

in each family. So family number 1 we'll assume has 4

children. Family number 2 we'll assume has 3

children. Family number 3 has 2 children, family

number 4 has 3 children, and family number 5 has 0 children.

So this is our universe. The total number of children is 12 and so

the average number of children per family is 12 over 5 which is equal to 2.4.

So this is the correct answer. 2.4 is the average number of children per

family in this universe. But if I use the sampling scheme in the

previous slide, where I sampled by child or by kid, I'm going to get a different

answer. To see this note the following.

There's a total of 12 children. So if I sample by child, then 4 out of 12

times, I'm going to sample 1 of these 4 children.

Each of these children will say 3 siblings plus themselves will lead to 4.

I've got 2 families with 3 kids, so that's a total of 6 kids.

So, 6 out of 12 times I'm going to sample a child from here or from here.

Each of those children will say they've got 3 fam, 3 kids in their family

including themselves. 2 out the 12 times I'm going to sample 1

of these 2 children. And each of these 2 children will say they

have 1 sibling. So, 1 plus themselves will equal to 2 so I

get an answer of 2 here. And then 0 out of 12 times, I'll sample

from down here and a reported number of siblings will be 0.

So, I'll get a total here equal to let's see, it's 16 plus 18, 34.

34 plus 4 is 38 over 12. And 38 over 12 is equal to 3 and 1 6th.

So in this case, the way I compute the average here, by sampling by child, I'm

going to get an average of 3 and 1 6th, and this is the wrong number.

The average I want is 2.4. So what I've done here is I've actually

calculated the average incorrectly. I want to know the average number of

children per family. So what I should be doing is sampling by

family. Which is effectively what I'm doing down

here. Instead, the sampling scheme I gave to you

on this previous slide, I'm sampling by person, or by child if you like.

And by doing that, I'm getting this average over here.

And in fact, I'm getting a number that's too large.

And in fact, that's how c hat would be biased.

I'm more likely to sample children from large families, as we saw here, so those

families will over report themselves. They'll have, we'll see higher average

numbers as a result. We'll get 3 and 1 6th in this case.

And in fact, an easy way to see this is to note the families with 0 children will

never be sampled. So if we're ignoring all families with 0

children it should be clear that our bias is upwards.

And here's another problem that has been very topical recently.

It concerns the controversy surrounding waiting times to get through immigration

at Heathrow Airport In London. This was a big news story last year when

many people who were entering Heathrow airport, and had to wait a very long time

to get through immigration. So a lot of newspapers were writing in

about, writing about this problem at the time.

It was definitely a source of controversy in Britain.

And so people were interested in estimating the average waiting time of

travelers at immigration at Heathrow airport.

One way in which this est, in which this average waiting time is estimated was as

follows. Sample 1 person every hour, compute that

person's waiting time and then take the average of all these people.

So maybe there is 16 hours in a day. We get x 1 up to x 16.

We sample 1 person from each hour, find their waiting time, and take the average.

And then report this as the average waiting time to get through immigration.

The question here is, is this a good scheme?

I'm not going to answer this question, but you can think about it.

I will give you a hint, it's a bad scheme. And it has a fundamental problem which is

similar to the problem on the previous slide where we discussed ways to compute

the average number of children per family.