Over the last few weeks, you looked at time-series data and examined a few techniques for forecasting that data including statistical analysis, linear regression, and machine-learning with both deep learning networks and recurrent neural networks. But now we're going to move beyond the synthetic data to some real-world data and apply what we've learned to creating forecasts for it. Let's start with this dataset from Kaggle, which tracks sunspots on a monthly basis from 1749 until 2018. Sunspots do have seasonal cycles approximately every 11 years. So let's try this out to see if we can predict from it. It's a CSV dataset with the first column being an index, the second being a date in the format year, month, day, and the third being the date of that month that the measurement was taken. It's an average monthly amount that should be at the end of that month. You can download it from Kaggle or if you're using the notebook in this lesson, I've conveniently hosted it for your on my Cloud Storage. It's a pretty simple dataset, but it does help us understand a little bit more about how to optimize our code to predict the dataset based on the nature of its underlying data. Of course, one size does not fit all particularly when it comes to data that has seasonality. So let's take a look at the code. Okay, first of all, if you're using a codelab, then you'll need to get the data into your codelab instance. This code will download the file that I've stored for you. You should really get it from Kaggle and store it on your own server or even manually upload it to codelab, but for convenience, I've stored it here. Here's the code to read the CSV file and get its data into a list of sunspots and timestamps. We'll start by importing the CSV library. Then we'll open the file. If you're using the codelab and the W get code that you saw earlier, downloads the CSV and puts it into slash temp. So this code just reads it out of there. This line, next reader, is called Before we loop through the rows and the reader, and it's simply reads the first line and we end up throwing it away. That's because the column titles are in the first line of the file as you can see here. Then, we will look through the reader reading the file line by line. Our sunspots are actually in column 2 and we want them to be converted into a float. As the file is read, every item will be read as a string so we may as well convert them now instead of iterating through the list later and then converting all the datatypes. Similarly, we'll read the time steps as integers. As much of the code we'll be using to process these deals with NumPy arrays, we may as well now convert a list to NumPy arrays. It's more efficient to do it this way, build-up your data in a throwaway list and then convert it to NumPy than I would have been to start with NumPy arrays, because every time you append an item to a NumPy, there's a lot of memory management going on to clone the list, maybe a lot of data that can get slow. If we plot our data it looks like this. Note that we have seasonality, but it's not very regular with some peaks and much higher than others. We also have quite a bit of noise, but there's no general trend. As before, let's split our series into a training and validation datasets. We'll split at time 1,000. We'll have a window size of 20, batch size of 32, and a shuffled buffer of 1,000. We'll use the same window dataset code that we've been using all week to turn a series into a dataset which we can train on.