0:21
Hi, I'm Tucker Balch.
And this is Computational Investing, Part I.
In this video we are going to dig into the data.
In particular we are going to look at actual vs adjusted prices.
Price data, of course.
So actual prices are the prices that are actually
posted on the exchanges at the close.
If you look back in history, those are recorded on every exchange for
every equity.
And those are the prices that the stock was that day.
Adjusted prices are prices that are adjusted so
that they account for things like dividends and splits.
And the reason this is important is if you bought a stock back in history and
just counted its value as being the actual close on the exchange.
You would fail to account for many important aspects of the data.
Well, the return you would actually get.
So for instance, stocks pay dividends.
That's money that goes into your pocket that's not reflected
directly in the price, and
there's also things called splits that we'll get into in just a second.
But anyways, with adjusted data, you can go back in time and
simulate purchasing a stock and seeing its value accrue.
Just as if you could really hold it, held it.
And you can't do that with actually closing prices.
1:56
Here's an example.
This is IBM.
This is data just pulled down from Yahoo.
You can see here the close, which is actual close,
the close data from Yahoo.
With very recent data, the actual close shown here in red and
the adjusted close are exactly the same.
2:20
And it's always the case that the very last close,
the adjusted close and the actual close are always exactly the same.
However if you go back in time, here all the way back to 1962,
you'll see that the actual close for that day,
was $572.
But our adjusted close is only $2.53.
So why this dramatic difference?
2:50
Well, that's because the adjustments we're gonna look at.
One thing that's kind of interesting to note is if you had purchased IBM in 1962,
you'd see this growth from $2.50 to almost $200.
So almost a hundred times return.
Here's what that chart looks like.
3:31
Look in particular at this big apparent drop over one day.
The stock went from right about $300 down to $75.
What's going on there?
Well, it's not that the value of the stock dropped to one-fourth its value,
it's that IBM split their shares.
In other words, what they did is on that day everyone who had one share of IBM,
suddenly had four shares of IBM.
And the price went from $300 to $75.
The reason that companies split their stock price like
that is because very high prices make the stock a little bit less liquid.
It's hard for people to gain a significant, to purchase 100 shares,
if they have to purchase $30,000 worth versus if the price were down at 75,
they could purchase 100 shares for $7,500.
It also matters with regard to options.
Options control 100 shares at a time.
4:45
And for some options you actually have to hold the shares so
that means to enter an options position you'd have to,
before that split, you'd have to enter a $30,000 position.
So anyways, if we adjust for splits,
you see this nice smooth growth curve that reflects on
the increasing value if you had held a share of IBM.
So here's how they adjust for splits.
I know it's kind of a crude drawing, sorry.
So first of all, a reason you need to know about this data is so that you can,
you need to know which price you should use in certain situations.
Sometimes you should use adjusted prices, sometimes you should use actual.
5:44
Anyways, comparing actual and adjusted, I'm showing a stock here that's,
that grew from $50 to a $100 then split back down to $50.
Then it grew up to a $100 again and split down to $25.
So that first split represents a 1:2.
In other words, you get 2 shares for the 1 you held before.
This latter one is a 1:4.
In other words if you held one $100 share,
suddenly the next day you have four $25 shares.
Now on the bottom here we show what the adjusted price looks like.
One thing to point out is as you go back in time, if there's not a split or
dividend, the adjusted and the actual are the same.
And that's indicated by this sort of blue dotted region.
But when we get to this first split,
you'll notice that on the adjusted there's a smooth transition here, no change.
And the way they accomplish that is at the time of the split they go back in time,
and they divide all the previous prices by four.
And so you get this smooth transition.
Then when this split occurs they continue back in time and
they divide those prices by two.
And you go back in time, each time you reach a split you make that division.
That's how by the time you get back, say for IBM,
back to 1962 that's why the price, the adjusted price is like $2.50.
For some stocks with very very significant growth, the adjusted price
is in the pennies and while today, it's in the hundreds of dollars.
Okay, so that's adjusting for splits.
7:42
This is a price history of a real estate ETF called AGNC.
And note that their equity price is, at the end here, is about $30.
But several times a year they pay like a $1 40 dividend so
over a year they pay, one two three four.
These are quarterly so they paid four.
So we're looking at about $6 in dividends or 20% of the value of
the stock each year in dividends and
that's additional money that goes into the pockets of the shareholders.
8:25
Now the prices you see in Google and
Yahoo that are charted, most charted prices are actual prices.
And so they don't reflect the actual gain and value of these dividends.
So this chart shows the price starting at
the beginning of 2009 at about $15,
and mid 2012 at about 30.
So it's that the price has gone up about 200% over those few years.
Actually, that's mid 2008.
So, 9, 10, 11, 12, so over four years it's gone up about 200%.
All right, now here's a comparison with actual
price on the left and adjusted on the right.
Two different data sources but I've stretched the one on the right so
that it's on about the same scale as the one on the left.
Here's $30 and here's $15.
If you look at the adjusted price which includes dividends,
I don't think there were any splits in this stock.
You see the stock has gone from below 10 to above 30 over that same period.
So it's really grown by about 300%.
So if you account for dividends, you see this much more significant growth.
10:00
If you look at the actual price of a stock as it approaches the dividend, in
this case it starts as a $10 stock and it's gonna pay a $1 dividend.
Large dividend represents a 10% dividend.
That's huge.
But I'm just using it for illustrative purposes here.
When you look at that sort of price,
here's what happens if you look at the actual price.
It climbs until the day the dividend is paid, it's called the x dividend date.
Then the very next day the price is recorded $1 lower than the close before.
10:40
And the reason for this is that on this day if you're a shareholder,
you hold a stock plus a dollar.
So your value is really up here, the stock price
plus the dollar.
And it's one dollar per share.
And folks know that the money comes from somewhere.
It comes out of the bank of the stock.
And that means that the value of the stock literally does drop by $1 per share,
because they're taking that much value out of the bank and giving it to shareholders.
It's not in the bank anymore.
So if you remember how we calculated value of a company,
that book value goes down by $1.
And accordingly, the price goes down by $1.
But if you just then look at historical close prices,
it wouldn't reflect the value you would gain by collecting these dividends.
So the way they adjust for that is on the date of the dividend.
They adjust the price previously downward by
ratio that is keyed to the amount of the dividend,
and you see after the adjustment.
It appears that this stock is climbed from about 9 over this period
to about 10 which represents growth of a dollar.
And then the adjusted price continues forward.
And again the latest prices before splits and dividends are always the same.
12:22
Okay, gonna talk about another topic regarding data, and that is missing data.
If you read the data that you get as part of QSTK,
you'll notice that sometimes there are, for
particular days, values called NaN, not a number.
12:57
So there's a number of reasons for a particular day, for a particular stock,
we might not have a value.
And the way we represent that in Python is with this NaN.
So let's look at this example stock.
It traded for this period with prices and stopped trading, it went away.
Maybe it just didn't trade,
maybe the SEC said it couldn't trade for some reason.
Anyway, then this data starts trading again.
Finally it stops and going forward it doesn't exit anymore.
Now, the values in the data are NaN during these periods and that's fine.
Except when you want to run a calculation on it like calculate daily returns or
see how its value would contribute to a portfolio over time.
The N N values totally mess things up.
So what can you do about it?
There's two key functions that pandas and numpy provide.
Well pandas, numpy you'd have to do it manually.
One is called a fill back, which is you go backward through the data,
and whenever there's a NaN, you take it from the previous value.
So if we fill back, our price history would look like
this red line along with this blue line, until here.
14:36
So in this case, we could use fill forward or
fill back which one is the right one to use.
Well, you always want to fill forward first
because that prevents you from being able to peek in the future.
So for instance, if we're doing a simulation and we got to this
point in time, we step forward one day and suddenly the price is up here.
15:07
You don't wanna allow yourself to peek forward so you fill forward like this
meaning that you don't see this future price until you actually get there.
However, at the very beginning of the stock's price,
you can't fill forward because there was no previous value.
So usually what you do is you fill forward all of your data first,
so you're avoiding peeking into the future.
15:38
Then you fill back to cover this example before the stock existed.
This is allowing you to peek into the future, to some extent.
You know that this stock is going to exist in the future at this price,
so you have to keep that in mind and
make sure you're doing the right thing when you use data like that..
Anyways, this gets rid of the NaNs.
Means you can do all sorts of calculations with the data without the NaNs
polluting it.
One thing that's important is if there's a NaN anywhere in a calculation,
the whole thing becomes NaN, and it sort of ruins what you're doing.
Okay, so summarizing, know what your data is.
Know if you're using adjusted close or actual close.
Yahoo data and Google data that you see displayed in the charts
is actual close, except it's adjusted for splits.
It is not adjusted for dividends.
When you download the historical data from Yahoo, you can get
the adjusted close as adjusted for splits and dividends.
I'm not sure what the case is for Google.
Anyways, it's important that you know which one it is
when you're working with it.
When should you use adjusted, when should you use actual?
In most cases, when you're looking for
patterns in the data, it makes sense to use adjusted close.
However, if the actual literal price matters, for instance,
that $5 event, you need to use actual closing.
Now remember when you're filling, always fill forward first, then fill back.