So what does that have to do with ours?

Remember, what we have is that we have temperature.

We have rain.

We basically have the days of the week and so on.

So that's our input.

So that's the x and y in my playground.

So I have rain and maximum temperature.

Those are my inputs.

I have a bunch of hidden nodes that I'm going to create.

And I'm going to predict, and

not predict whether it's orange or blue, but predict the taxicab demand.

And so this is my model, my neural network.

And I'll basically keep adjusting the weights on this model

based on this data set such that given a particular rain and a temperature,

it gets as close to the observed number of rides on that day as we can.

So in order to use TensorFlow, step number one,

we need to basically collect the predictors.

In our case, our predictors are the weather and the day of the week.

And the target data, and in our case the target data

is the number of taxicab rides that happen that day.

Then we create a model.

Creating a model is figuring out how many nodes you need, how many layers you need.

And then train the model based on the input data that is adjusting the weights.

And now you have a trained model.

Use this model to do your predictions.

So that's basically how you use TensorFlow.

So let's look at how that looks in code.

To collect predictors and target data,

this is exactly what we did in the previous lab.

We used bit query.

We made a Pandas DataFrame, and we have our DataFrame.

Typically we'll write this out untill something like a CSV file and

we'd use that for training.

But one thing that you will have to do, is that you will have to go through your data

and figure out which data sets are numeric, and which ones aren't.

Because in neural network, all its doing is adding and subtracting numbers, so

it needs to all be numbers.

So let's go ahead and take a look at our data.

So let's say, minimum temperature 28.9, is that a number?

Yeah, maximum temperature 37.9, number?

Yup, sure.

Range 0.01, number?

Sure, how about the day of the week, day of the week is 4, is that a number?

Well, yeah sure we're representing it by a number but

that's like what's called a sparse representation.

In reality though, the day of the week is a categorical variable.

What is Sunday divided by Thursday?

It makes no sense, right?

So you can basically divide one by four and get 0.25, right?

Sunday divided by Thursday is not 0.25.

So day of the week, is a categorical variable and what you do with categorical

variables is that you basically do what's called one hard encoding of them.

So you say, is it Sunday, is it Monday, is it Tuesday, is it Wednesday, etc.

And then, depending on which one it is, you put a 1 in that column and

everything else is a 0.

The other thing that you need to be careful about is you need to look at your

input and say, do I have at least five examples of these.

That's a rule of thumb, five to ten examples of that particular value.

So do you think you will have five to ten examples of

days in which the minimum temperature was 29 degrees?

Probably, maximum temperature is 38 degrees,

five examples of those, yeah sure.

Five examples of days in which it rained 0.01 inches, yeah, probably.

Five examples of day number 77.

Remember that we trained,

our training data set is essentially two years of data, 2014 and 2015.

So how many April 23rds are there in our dataset?

Only two, so we've got to throw away the day number.

The day number is way too specific.

If we incorporate the day number into our predictions, then the neural network

will essentially just memorize that on day number 77, the answer is 51,635.

And I'd be quite happy to tell you that on April 23rd,

any year that you throw in, it's going to be 51,635.

So that's not what we want, so lets throw away the day number.

We know the number of predictors,

which is like the number of weather variables and so on.

So we basically have the number of predictors and

then we basically choose the number of hidden nodes.

Again, that's something that you choose arbitrarily, but

you do some experimentation to figure out what works.

So let's start with, I'm going to have one layer with five hidden nodes.

So I'm going to create a deep neural net regressor with five hidden nodes and

basically pass in all of my feature columns.

That's pretty much it, so I now have my neural network model.

I want to train the model.

To train the model I called the fit function, passing in the predictors,

passing in the targets and we're done.

So we want to use the neural network model, right?

You don't actually need the taxicab rides anymore.

All you need is the original inputs.

So to do your prediction, you basically go ahead and

create a dictionary consisting of all of your predictor variables.

So let's say we want to predict for three days.

Thursday, Friday.

No, actually, this is Wednesday, Thursday, Friday.

4, 5, and 6.

And we'll basically pick minimum temperature on those days.

Let's say the weather forecast is for 60, 15, and 60.

And the maximum temperature is 80, 80, and 65.

And the rainfall amount on those three days is 0, 0.8, and 0.

So that's my data.

And we basically say, estimator, please predict for this data and

we get backup predictions.

That's it, right?

So those are our four steps.

Let's recap them.

You collect your predictors and data.

You throw away very specific information, like day number that identifies a row.

And you make sure that all of your predictor columns are numeric.

They're not categorical data.

If they are categorical data, then you basically won't have them coded.

And then you basically create a neural network model.

And that involves specifying the number of hidden nodes and deciding if it's

a regression problem or we didn't quite talk about our classification problem.

A regression problem is where you're predicting a number, and

a classification problem is where you're predicting orange or blue, right?

You're predicting a category.