Hi, welcome again. I'm Senthil Veeraragavan. Welcome back to the first week of operation Journalistic course. We are about to start session three. In session two, we learned about the mowing averages method. We also learned about discriptive statistics and how to adapt the districtive statistics for predictive purposes. We also went through some example problems. You learn more about predictive statistics as you progress through this course. In this session I'm going to talk about how to forecast when there are trend in your data or when your data appears seasonal. I will illustrate the methods using examples. Let's have a look. >> Welcome back this is week one of session three of Operations Analytics. We're going to discuss trends and seasonality in this session. Recall our Newsvendor demand data. In that chart we can see that the data is stationary, that is, demand is largely steady with some noise and variations. There is no perceptible trend, also. But there is often trend in data. For example, if you look at US retail sales data that I show you in the chart below, you can see there is a trend. Both ecommerce sales and total sales show a generally increasing trend. On the y axis, you have Dollar sales in millions of dollars. And on the x-axis you have the year. And you can see that the trend is generally increasing except for a couple of years, 2008 and 9 when there is a blip in the data due to the economy shrinking. Data may also have seasonality. For example, some months may consistently have more retail sales than other months. On the chart I show you, on x axis the month, on y axis the sales in millions of US dollars. And you can see in 2013, sales are higher during the later half of the year. This is in fact true for other years too. Seasonality effects often come from predictable annual events. They could be cultural or they could be weather related. For example, Thanksgiving sales in the US, Boxing Day sales in Canada, UK and Australia. Diwali sales in India, Chinese New Year sales, all of them happen during certain parts of the year. Similarly, you can think of ski sales in winter. All of the sales pattern show seasonality. Forecasting when there is a trend in such cases, where the data reveals strand. Let us examine two ways of forecasting the future demand. One, using moving averages. Two, using linear regression. In addition, we can also adapt exponential smoothing to adjust for trend. Forecasting using moving averages. Moving averages lag trend. What do I mean by that? If there is increasing or decreasing trend in data, forecasts generated by moving averages lag behind trend. When there is an increasing trend, moving average forecasts are usually below the demand. When there is a decreasing trend, moving average forecasts stay above the demand. Let's look at how moving averages lag trend using an illustration. In this table below, I show you demand during several periods. You can see that the demand displays an increasing trend. 10, 20, 30, 40, 50, 60 and so on. Although it's easy to forecast by just looking at the trend, I'm going to show you, using moving averages, you're going to lag behind the trend. Let's use moving averages of two data points, and let's try to forecast for period 5. To forecast for period 5, I look at last two data points, 30 and 40, add them up, divide by 2. I get 35. That's my moving average forecast for period 5. You can see that lags behind the trend. You can keep on doing this, he get 35, 45, so on, and every time the data that you forecast is lagging behind the trend in the real data. Maybe we can fix this by using more points in our moving average. Let's use moving average of three. To make a forecast for period 5, we need three data points. Last three data points are 20, 30, and 40. We add them up, take the average that gives us (20+30+40)/3 = 30. You see the forecast of 30 lags behind the trend even more than using two data points. Let's use four data points to predict demand for period 5. We use 10, 20, 30, and 40. You add them all up, take the average. (10+20+30+40)/4 gives you 25. That's our forecast for period 5 using moving averages method of 4 datapoints. You see that moving averages again lags behind trend even more. So when we use two data points, we had a forecast of 35. When we used three data points, we had a forecast of 30. When we used four data points, we had a forecast of 25. And all these forecasts lag behind trend. So it seems like more data you use, moving averages lag behind trend even further. How to fix this issue? Now let's look at time series forecasting using regression. The main forecasting idea is to fit a straight line that has slope to capture the trend in data. When trend is present, linear regression methods can be used. And the model is to fit a straight line. In this case, Dt = a + bt. So, if you wanna come up with forecast for a period t, we can calculate it by noting the time period t, multiplying t by the slope b, and then adding the intercept a. So a + bt gives you the demand forecast in period t. We might wonder how to fit a trend line. What's a best fit trend line? Ordinary Least Squares or OLS is a method that fits a trend line through the data by minimizing the squared errors. Mathematically, a straight line, Dt = a + bt is fit through the end data points. And the parameters a and b are chosen to minimize the average square distance of the data from the trend line. The average square distance is nothing but the mean squared error, which we saw when we were measuring errors in the previous session. Instead of presenting the algebra of how to fit a straight line on paper, I'm going to present how to fit a trend line in Excel using an example. Here's our example. Data on visitors to Yellowstone National Park. The Yellowstone National Park is the most visited National Park in the United States. In the file, Yellowstone Template, we have data on annual visitors to the national park. The source of the data is nps.gov. Forecasting visitors to the park. National Park Service cares about the visitor data for several purposes. To plan for camping or backpacking permits, to plan for adequate emergency services, to plan for food and field services for visitors and cars, to budget revenues and costs, to understand the ecological footprint. The data from 1904 to 2014 shows an increasing trend. You can see that in the Yellowstone Template file. We will see how we fit a straight line to use for forecasting. I'll demonstrate how to fit a straight line, using two examples. The last 50 data points, that is 1965 to 2014 and the last 30 data points, 1985 to 2014. Welcome, let's look at how to fit a trend line on a data set that displays trend. On this file, YellowstoneTemplate.xo file, I'm going to show you a data set that is a public data set. We see the visits data, the number of visitors to Yellowstone National Park. Yellowstone National Park is one of the most popular national parks in the United States. And the data you're seeing is public data available at the National Park Service. This data is even more granular, but I'm going to present to you the annual data. This annual data is pretty detailed. It goes all the way up to 1904 and is as recent up to 2014, which is last year. In column B you see the year, and in column D you see the number of visitors each year. They start it at 13,000, and now it's in several millions. On the right, I show you a graph. Just inspecting the chart, you can see that there is a clear increasing trend, but there is also a lot of variability. And for forecasting purposes, we're gonna look at recent data and try to find a trend line. Other than the increasing trend, you might be curious to note, there is a big dip in the data out here, which actually corresponds to the World War II. Anyhow, let's go ahead and try to fit a trendline in our data and see how the trend lines fit. So let's take the data, I select the entire data, or you can select the data you want for the. So let's use 50 data points that goes all the way back to 1965, and to 2014. There are 50 data points here. I think I chose this data and I'm going to insert a chart. Any chart will do, but I'm going to insert a chart with dots here. And that's a scatter plot with indicators. And I'm going to delete the legend here, we only need the visitor's data. And, I'm going to change some colors here, make it monochromatic for better visibility, and I'm going to fill this with black outlines and black dots. So you see a clear charge where there is a [INAUDIBLE] data, and it's showing some trend, 50 year trend line. I'm gonna fit a trend line here. Before we fit a trend line, let's add some axis, just to understand clearly that y-axis is showing us the number of visitors. And the x-axis ,of course, shows us the year. Now with this data, if you right-click it, it shows Add Timeline. As simple as that you can add a timeline. And you can see there are several options. There's an exponential line curve you can fit, a log curve we can fit, but we just going to fit a linear trend, or a straight line. That's right here. An, if you go down the choices, you can also see display equation on the chart, which is very helpful for us. And I'm going to display the situation, it's right here, it's hidden. So I'm going to bring it out here and I'm going to change some things. Some things, the numbers are not very readable here so I'm going to change them by clicking on the text options, going down and looking at the number, changing it to a number, and reducing the number of decimal places because our data goes into thousands. So, this is the line again. Y=28,248-50,525 [INAUDIBLE] That's it. You can place the data trend line anywhere, but let me just place it here. That tells us very clearly what the trend line's slope and the trend line's intercept is. To forecast, all I have to do is put in the year, x, and it will get the y value, which is the number of visitors. That's it, on how to fit a 50 year trend line. Let's do one quick example, one more quick example, of fitting 30 year trend line. And then we will go back to the slides. So let's take 30 data points, that is going from 2014, all the way up to 19. 86, 85. That's 30 data points. And you can see that here. And again, I go and I insert a chart and I get a scatter plot. Let's move lines here. I don't need the empty series, so I delete it. And I have, can fill with it. So, let's make it monochromatic for better visibility again. And we get a grey scale and I'm going to make it. Just factor out so you have the same kind of curve. And this is a curve that shows your data over 30 years. And again, we add a timeline by just clicking on the data, Add Timeline, you have a bunch of options. You can choose many options including moving average. And fitting a straight line, and I'm displaying the equation. So I display the equation here in this case. And I have the same number problem, so I go and change that by choosing Text Options or Label Options on the last bar column. Change the number from general to number and reduce the decimal places in this case, and I get y equals 30,191 x minus 57,401,021. That's my intercept, it's negative. And that's my slope. And that gives trend line and that gives us forecast. So we can go back to our slides right now and try to forecast based on 50 year trend line and 30 year trend line. Let's see how to fit a trend line in our data using an example. In the Yellowstone Template we can see that, you can fit a straight line through the last 50 data points. Fitting a trend line through the 50 data points gets us the following equation. Dt=-53,525,580 + 28248t. You can see the Excel video to see how to fit a trend line. The chart below shows you on the X axis the year, and Y axis the number of visitors. And you can see the actual data is shown by this black dot, they are increasing and decreasing over the years but generally an increasing trend. The dollar straight line gives you the best fit trend line, or the best fit straight line that minimizes the average squared add of So how do I come up with a forecast? If I'm interested in how many people are going to be visiting Yellowstone National Park in the year 2017, all I have to do is plug in T=2017 here. So, T=2017 multiplied by the slope, and adding the 53,525,580 at the intercept, you get 3,450,636 visitors. Again the forecast for 2017 would be 3,450,636 visitors. Let's go through another example to fit a trend line. Using the last 30 data points we get the following straight line Dt = 57,401,021 + 30,191 times t. In the chart below you can see the 30 year trend line. The thick black dots show you the actual number of visitors to Yellowstone National Park. And the dotted straight line shows you the trend line that is fit through the data. And the trend line minimizes the mean squared error. Suppose you wanna forecast how many visitors are going to visit the park in the year 2017. To count for the forecast, all you have to do is the following. Replace t by 2017 and calculate the entire thing. So, 2017 times 30,191 minus 57,401,021 will give you 3,494,226 visitors, roughly. I go through the calculations on an excel video and the generator excel charts are available in Yellowstone solution.xlsx. Having a just trended our data,we now are ready to address Seasonality by calculating the seasonal factors. Here is an example of visitors to a national park, that shows seasonality. I simplified the example just to illustrate the concept. I'll also walk through on Excel sheet, a much realistic example, based on real data. This example you see visitors to National Park in thousands. The data comes three years, 2012, 2013, and 2014. And you see that there are four seasons of the year and the data is compartmentalized into those four seasons. You can very clearly see the number of visitors every year in winter is much lower than the number of visitors in summer. And just looking at the chart below, you can see that the data shows some seasonality. It goes up and down, up and down in a regular fashion. How do we predict data based on seasonality effects? Now that's the next step. Let's look at how to forecast a seasonal series. Seasonality corresponds to some pattern in the data that repeats itself at regular intervals. We can forecast using something called the multiplicative seasonal factors. I use CI to represent the multiplicative seasonal factors. They can be C1, C2, C3 and so on up to CN. Begin where I equals one is the first season, I equal to two is the second season and so on and N is the total number of seasons. If we add all the seasonal factors, Ci, together, they should give you N which is the total number of seasons. How do you think about seasonal factors and the data value present? Ci equals 1.25 implies that that season is 25% higher than the baseline on average. Similarly, Ci equals 0.75 implies that season is 25% lower than the baseline on average. In the retail industry data, December sales are significant. This means that December sales will have a high seasonality factor in the sales data. Let's look at the method for estimating seasonal factors. The method I'm going to show you has four steps. In the first step, Step 1, I'm going to calculate the sample mean. I compute the sample mean for the entire data set. We saw how to do this in the earlier sessions. In step two, I calculate seasonal averages, that is, I average the observations for the N like periods in the data. For instance I average all summers, all winter data and so on and so forth. In step 3, I calculate the Seasonal Factors. That is I divide the averages that I got from Step 2 by the sample mean And I get N numbers. The resulting N numbers will add exactly to N and will correspond to N seasonal factors. And the final step is called De-seasonalization. What do I mean by De-seasonalization? I have removed seasonality from the data set. To remove seasonality from a series, I simply divide each observation in the data by the appropriate seasonal factor. The resulting series will have no seasonality and is called de-seasonalized series. To understand this four steps better, let's walk through our example. Let's look at the first step for forecasting. Let's calculate the sample mean. Here is our data set. Three years of data and four seasons in each year. I calculate the sample mean by taking all the data points and finding their average, and the mean comes out to be 14.333. Let's look at the first step for forecasting data with seasonality. The first step, I calculate sample mean. Let's look at the data again. It's three years of data and four seasons every year. I take all the data points and calculate the average, and that gives me my sample mean. The sample mean is 14.33. In the second step, I'm going to calculate season averages. Let's look at our data, how to calculate season averages can be seen by looking at each season. I have three years of data. To calculate the fall average, I take fall 2012, fall 2013, fall 2014 and average those three data points. So, I take the average of 16, 15 and 14 and the average I get is 15. Similarly, I can calculate the winter averages by adding 7, 6, 6 together and I get an average of 6.33 over three years. So in the rightmost column, I have four numbers which gives me the season averages or the like seasons. So fall average, winter average, spring average and summer average. That concludes our second step. Now I can do the third step, which is to calculate the seasonal factors. To calculate the seasonal factors, I take each season's average and divide it by the sample mean. Let's go through it step by step. For fall, I take the sample average for fall, which is 15, and divided by the sample mean of the entire data set which is 14.33. So fall seasonal factor is achieved by dividing the fall average by the whole sample mean, 15 divided by 14.33 gives us 1.05. Similarly, let's do it for winter. The winter average is 6.33. I take the winter average and divide it by the sample mean across the entire data set which is 14.33. So 6.33 divided by 14.33 gives me 0.44. So, in the similar fashion, I do it for spring and summer, and I get 0.84 and 1.67. These four numbers in bold give us the seasonal factors for the four seasons in our data set. You can add them up and see that they will add up to four, the four seasonal factors add up to four. Let's take a minute and. And interpret what the fall season looks like, the winter seasonal factor looks like and so on. The winter seasonal factor is 0.44. What this means is in winter you are expecting only 44% of the demand compared to the baseline demand. Similarly, in summer, you're expecting 1.67. The demand in summer is going to be 1.67 times the baseline demand. You can see that in the chart, the summer demand is higher and the winter demand is low. Let's go to the next step, step three in which I calculate the seasonal factors. To calculate the seasonal factor for fall, I take the fall average. In this case it's 15. And divide it by the sample mean, which is 14.33. 15 divided by 14.33 gives me 1.05. And 1.05 is the seasonal factor for fall. Similarly, to calculate the winter seasonal factor, I take the winter average which is 6.33, divided by 14.33 which is the sample mean and I get 0.44. In a similar fashion, I can calculate spring seasonal average, and that comes to 0.84, and summer seasonal average, and that comes to 1.67. So I got four seasonal averages for four seasons, and if I sum up all the seasonal factors, I will get four. So 1.05, 0.44, 0.84 and 1.67 should all add up to four. How do I interpret the seasonal factors? For example, how do I think about winter seasonal factor of 0.44? Seasonal factors tell you how strong the season is compared to the baseline. For example, the demand in winter, on average, is 0.44 times, only 44% of the baseline demand. Or the demand in summer, for example, Is 1.67 times the baseline demand. You can see that the winter demand is generally, typically low compared to the baseline and the summer demand is high. Now, let's go and look at the last step of de-seasonalizing. Now we reach the final step, which is to generate the de-seasonalized series. How do we generate the de-seasonalized series? Let's look at the original data. Have the original data here, and I have all the numbers that I generated in step two and step three here. Now I take each data point in my entire data set and divide it by the corresponding seasonal factor. And this will help me generate the de-seasonalized series. I divide each data by the corresponding seasonal factor. Just to make it clear, I divide each fall data by the fall seasonal factor. So, for fall, I divide by 1.05. For winter data, I divide by 0.44. For spring data, I divide by 0.84 and for summer data, I divide by 1.67 and so on. And this should give me my de-seasonalized data. In winter of 2012, I divided by the winter factor and I got 15.91. Similarly, for spring I got 14.29, and so on. Looking at the data we can see the numbers are more or less equal here. So I got these numbers by dividing each data point in my original data by the corresponding seasonal factor. Just to understand how a de-seasonalized data series looks like, in the chart below I plot the original seasonal factors. Original seasonal data in the gray scale, and the deseasonalized data in the black squares. And you can see that the resulting series has no seasonality and is called a de-seasonalized series. In fact, it can be treated as a stationary series for forecasting purposes. Let's look at real data based on airlines to forecast in data seasonality. On the course website, you'll see a file called Week1AirlineTemplate.xlsx. This Excel file is based on data from Bureau of Transportation Services and gives you load factors for the top 100 US domestic market. And load factor means the number of passengers on a plane, divided by the available seats on the plane. And usually expressed as a percentage. A load factor of 80% means 80% of the seat on the aircraft are filled. The monthly load factor data that you see on the file goes from 2003 to 2013 and they show what trend and seasonality. You can see that airlines are getting crowded, more and more crowded over the years. You can also see that some months are more crowded than the others. Using the template I follow the same previous steps that I showed you to generate the de-seasonalized data. You can see the Excel video. All four steps are represented in the solution file, Week1AirlineSolution.xlsx. I hope you find the Excel sheets useful. Let's continue forecasting seasonal data. So far we estimated seasonal factors and we built de-seasonalized series. If you have the de-seasonalized series, forecasts can be made by using the de-seasonalized series and treating it like a stationary series. You can go back to lecture material in session 2 to recap how to forecast in a stationary series. For instance, one could simply use the moving averages method. Using the de-seasonalized data, come up with a forecast using the moving averages method for example. Multiply that forecast by the appropriate seasonal factor to obtain the final forecast. And that's it. That gives you the forecast for the seasonal data. I will continue with an example to show you how to forecast in the seasonal data. Let's continue forecasting with seasonality using our example. On the top of the slide, I present to you the data set that is de-seasonalized. Here is de-seasonalized visit data from 2012, 2013, 2014. On the rightmost column I have the seasonal factors corresponding to each season. Let's forecast for 2015 winter and let's use a moving average method of four periods. So using MA(4). That is the four data points that we need to use are 2014 data points. We have 13.33, 13.64, 14.29, 14.37. Add all these four data points and find the average. And the average of the last four data points from year 2014 is 13.91. And that's your de-seasonal forecast. Your de-seasonal forecast is 13.91. So, if you had a de-seasonalized series for 2015, your forecast would be 13.91. But we are looking specifically for winter 2015. So, for winter 2015, to come up with a forecast, we need to use the seasonal factor for winter. To do that, we take the de-seasonalized forecast, which is 13.91, and multiply it by the winter seasonal factor. So 13.91 times 0.44 that gives you 6.11. That's your final forecast for winter 2015 for data with seasonality. Having worked through the example, let's take a final snapshot at our problem and understand what our number means. On the table above, on the slide, I present to you the original data, which is seasonal. And you can see the observations made in each season, and the sample average. On the right you have the seasonal averages. Now let's come up with the forecast for winter 2015. We came up with it and it was 6.11. And you can notice that roughly matches with the previous winter demands. We can do the same process for coming up with forecasts for summer, fall, and spring, and so on. I did the same analysis, step one, step two, step three, and step four for our airline example. The template file is in Week1AirlineTemplate and the solution is given to you in Week1AirlineSolution. Hello, welcome. Let's try and forecast data when there is seasonality, using real data that we got from airlines. And this is from Bureau of Transportation Services. This data shows you load factors over last ten years in the top 100 US domestic markets. The way to think about this data is These numbers represent load factors. What's a load factor? For example, March 2005 shows us a number of 79.58. A load factor of 79.58 in March 2005 means that 79.58 percentage of the airline seats that are filled by passengers in March 2005. You can see over time airlines are getting fuller and fuller by filling more seats. So, in 2003 generally at 65% roughly, that is full, and towards the end of 2012, you have about 82%. Anyway, let's go and look at this data and try to forecast. The objective of this exercise is to go through four steps to generate a de-seasonalized data on which you can use simple, more elaborate method or exponential smoothing method to make forecast. So let's start those steps, this is step one. The first step is to calculate the sample mean for the entire data set. The data set has several data points and we're gonna take the average of this entire table here. And that gives us 79.44. I'm gonna just reduce, I'm gonna choose that number and reduce. Reduce it to 79.44. That's your sample mean. Now, we can go to the next step and generate seasonal averages. So let me show you the next step by taking this whole data to the next sheet. This is the data on the next sheet. I see the numbers 79.44. Was our sample mean. Season averages is calculated by averaging across like seasons. For example, averaging across all January data, all February data, all March data, and so on. So, average of January data is the following. And you get 79.59. And we can do the averages for all 12 months, and we get different numbers. As you can see, January on average is less busy, it's about 72%. And the month of June is very busy, it's about 85%. So clearly there is some evidence for seasonality. This concludes step two. In step three, we're going to go and calculate our seasonal factors. Remember the numbers January about 72.6 and June, July, about 84, 85. And these are the same numbers we saw before. I just reproduced those numbers just to show you step three. In step three, we calculate seasonal factors by taking the season average and dividing it by the sampled mean. I need to fix sample mean and I can either press F4. Or use dollar, D, $1.20. Or dollar column, dollar row just to get the number of fixed to 79 as you drag it along. So you have 91% or 0.91. Again, we're using the numbers to two decimals. We have various load factors. So load factor for January is 0.91. Load factor for June is 1.06. You can see there is a big difference. Now we have this data, we can go ahead and produce a deseasonalized series, and that's step four. So we have all this data. Now we can go ahead and start the deseasonalized series. How do you get a deseasonalized series? You take every data here and divide it by the seasonal factor. So I have 70.47. And I do this for 70.47. So I want to make sure that I'm fixing the data. And I get different values here. And similarly, I can take all these stored values and go down and I have achieved the deseasonalized data here. I'm going to leave these two empty cells empty because they didn't have the data but essentially we have generated a deseasonalized data. Even the deseasonalized data shows you some trend. This is a quick review of how to forecast in cases where there is seasonality and how to generate a deseasonalized series. So let's just go fix this, we have deseasonalized series. And which we generated through four steps. Step one, we calculated the sample mean. Step two, we calculated the season averages. Step three, we calculated the season factors. And step four, we generated the deseasonalized series. And just to show you the season factors all add up to the total number of seasons, which is 12, and so, approximate again, 12.00. So the seasonal factors must add up to the number of seasons. And this concludes our review of how to forecast in cases where there is seasonality. Even if there is trend, we can generate the deseasonalized series and then fit in the trend line on our deseasonalized data. So far we've been looking at forecasting methods in data with trend or data with seasonality. Before we wrap up, I'll share some thoughts. In any chosen forecasting methods or any chosen data set, you will have initialization issues. That is, we have limited data in the beginning and as you do more forecasting the data set grows. We'll examine an idea to address this limited data issue in the next session. In many real world scenarios, there is often the model selection problem in how much data to use and what data to use. We saw some of this in the Yellowstone forecasting example. We'll see more of such issues in the upcoming lectures over the next few weeks. Simple time-series data, and simple short-term forecasting methods perform pretty well. In fact, long-term forecasting, assuming trend, same trend, same seasonality, etc., is fraught with pitfalls since technology changes may happen. For example, it is better to forecast demand over the next day or the next week instead of forecasting demand over the long term, like next year or next three years or so on. Finally, tracking of errors is very useful for locating forecast bias. In the next session, we will look at how tracking forecast errors help you model demand. In session three, we looked at trends, and we looked at seasonality. In the next session, session four, we'll be looking at forecasting for new products. And fitting demand distributions. In this session, I introduced you to many forecasting fundamentals. Being able to describe your data, and being able to understand the patterns in your data, goes a long way in forecasting future events. We study data that showed a trend and we looked at data with seasonal effects. We analyzed two important applications. One example analyzed how to forecast visitors to a national park, and the other example studied airline load factors. I hope that these examples demonstrate the importance of thinking about forecasting methods in various settings. In the next session, I will cover one more forecasting setting based on a clothing retailer. That will wrap up week one.