0:00

Hello, everyone.

In this lecture, we'll start fitting SARIMA processes to real-world datasets,

and the first dataset we're going to look at is something familiar.

We're going to look at quarterly earnings per Johnson & Johnson share.

Objectives are to fit SARIMA models to quarterly earnings of Johnson & Johnson

shares, and to forecast future values of the examined time series,

in this case would be the earnings of Johnson & Johnson shares.

All right, so when we do the modeling, we're going to look at a few steps here,

and some of those steps we'll have already talked about before.

So the first thing we're going to do as always is to look at the time plot.

We're going to look at the time plot of the data set and

try to see If there is an outlier.

If there is a change in the trend, a change in the variance and so forth.

And if we need, we're going to transform the data set, right?

So transformation will help us, for example, try to stabilize the variance.

And then if you need to remove the trend or seasonal trend.

We're going to do differencing so we can do just non-seasonal differencing, or

seasonal differencing, or both at the same time.

2:15

Now as we do all of these, we're going to keep in mind the parsimony principle

which I highlighted the green here.

I'm going to talk about this in the next slide.

We have already talked about this a little bit.

We are trying to find the simplest model that fits the data.

In this lecture I'm going to quantify what I mean with the parsimony principle.

And once we have our model using the parsimony principle and

comparing the AICs and choose the minimum AICs,

we're going to look at the residuals, right?

We're going to look at the time plot of the residuals, ACF and PACF of

the residuals and we're going to look at the Ljung-Box test of the residuals.

So we will expect to get a white noise.

And this residual analysis, these last two steps will tell us if there is white

noise or non white noise in the residuals.

3:07

Now the parsimony principle, we're going to use the following.

If you have a SARIMA, which is p,d,q, capital P, capital D, capital Q, s.

S is the span of the seasonality, p is order of autoregressive process,

d is the order of differencing, non-seasonal difference.

D is seasonal differencing, q is order of moving average process.

Q is order of seasonal moving average process and

P is the order of seasonal autoregressive process.

And if I add them up, I do not want to have that complicated model I do.

We do not want to overfit the time series.

So we're going to basically use this parsimony principle that these

parameters should add up to something less than or equal to six.

5:00

So what we do first, we'll have to transform, right?

We talked about this before.

The transformation is going to, basically, the logarithm.

So we take the logarithm of the data to stabilize the variance.

And to remove the trend, we take the difference.

So difference of the logarithm of the dataset.

This is also effectively called log-return in specifically financial time series.

So rt is difference of the logarithms, in other words logarithm of the division.

This is a side note, we are not modelling rt.

We are basically modelling the logarithm of the transformation.

This is basically time series of the log returns, and

we Help to see a stationary time series here.

We can see that variance is different in the middle part of the data rather than

the end point, but if you're going to ignore that, and

you're going to say okay maybe restabilize by taking a lower [INAUDIBLE].

And I look at ACF and PACF, as you can see we do have a strong auto correlation with

lag four, lag eight and that is because of the seasonality.

So what we want to do we would like to take the seasonal differencing in this

case the capital D is going to be 1.

In R this is basically the transformed and difference data,

we take the difference with lag 4.

This becomes seasonal differencing, and if we plot the data set,

now our data set jj is differenced seasonally and non-seasonally.

Actually, the lower item of the jj is differenced seasonally and non-seasonally.

And we have some stationary, Time series here.

So what we're going to do.

We're going to look at, as we said, the Ljung-Box test.

So Ljung-Box test is basically a Box.test in R.

And we're going to take the lag as the logarithm of the data.

This is the common adoption.

And then we'll look at the p value and p value is very, very small.

So if the P value is small then we reject the null hypothesis that there is no

auto correlation between previous lags so there is some auto correlation

between previous lags and we're going to find them using ACF and PACF.

So let's look at ACF.

This is the ACF of the resulted data and this is the PACF of the resulted data.

ACF if I look at the closer spikes here, I have a spike at Lag 1 and

then it dies off, so this suggest MA1 models.

So the order of moving average terms would be one, but

if I look at lags, the seasonal lags, which is four.

In this case, it's period one, but the lag is four.

It is almost significant.

Not really significant because it's below.

It does not cross this dashed line.

But it's almost significant, so we're going to assume that.

So we might have some seasonal correlation, and

so this will tell us that maybe we have Order 1,

seasonal moving average term.

If I look at PACF, I see that PACF, there's a significant lag at 1,

again, then this dies off.

This will tell me, suggest me that maybe order of auto-regressive term is one,

and I see the other significant other correlation at lag four,

that it will tell me maybe the order of the seasonal auto-regressive term is one.

And then the other correlation dies off.

Okay, so ACF told us that q is either 1 or maybe it's 0,

we get to look at both of them, and capital Q is 0, 1.

Partial auto-correlation told us that p is maybe 0 and 1, and capital P is 0 and 1.

So we'll look at this SARIMA model's p, 1, q, capital P, 1, Q, 4.

4 is my span of the seasonality.

And these are the models for logarithm of the data,

and immediately just determine that PQ,

capital P capital Q, is going to be either zero or one.

We're going to use ARIMA routine from R, basically, we have the order.

This is the order for non-seasonal part.

And then we have a seasonal part including the period.

9:27

And we carry out this for all these possible values of p,d,

p, q, P, Q and we basically print them.

So these first six numbers are my orders p, d, q, P, D, Q.

This is s which is 4 for all of them.

Then we look at AIC values.

This is Akaike Information Criterion.

We also looked at the sum of the squares errors, this is SSE.

And we look at the p-value from the Ljung-Box test for the residuals.

So we do not want auto correlations left in the residuals.

So if you want high p-VALUE and we want smallest AIC and smallest SSE.

Our principle is going to be choosing the smallest AIC.

And I highlighted here the AIC is -150,9134,

even though the smallest SSE in this In this output is here,

which corresponds to different model.

But we're ging to go with the smallest AIC, because of negative,

this is the smallest AIC.

So the model that we'll agree on is going to basically 0, 1, 1, 1, 1, 0, 4.

And you can see from the p-value, p-value is high, so we cannot reject the null

hypothesis, there is no auto-correlation in the residuals in this case.

10:57

So our model is SARIMA ( 0,1,1,1,1 0)4.

Remember Xt is going to be model as our earnings,

but what we found this model is for Yt.

So we transformed Xt, and we have logarithm of Xt called Yt, and

we fit the SARIMA model using SARIMA routine or ARIMA routine,

the routine that we discussed, and we obtain the following result here.

These are ma1 because if there's ma1 here

this is the coefficient for ma1.

This is seasonal autoregressive order, this is corresponding to this one.

This is our coefficient.

This is standard errors and the p values are so small, so

both of these coefficients are very significant.