In a previous video,

we got familiar with tensorflow graphs,

ops, variables, and constants.

We also saw how tensorflow computes gradients of arbitrary functions.

In this video, we will apply

the machinery to estimate a linear regression model in tensorflow.

We will do it using two methods: The first one,

is based on the normal equation for linear regression that we introduced earlier.

The second one, is a probabilistic method called

the maximum likelihood estimation or MLE for short.

So, let's just dive in and see how it all works.

Here we are in a notebook.

The first cell our usual inputs will also input tensorflow as tf.

The second cell is the graph

reset utility function that we already used in our previous notebook.

The third cell defines a linear regression model that we fit.

We have a three predictors ,X1 to X3 that

are all uniformly sampled to an interval of minus 1 to 1.

Then we weighed them with weights B1 to B3,

add to this an intercept A and finally

add a Gaussian noise with the volatility sigma that equal 10 percent.

Next, we simulate 5,000 points of such integic data

and then split it in ratio of four to one into the train and test datasets.

Let's check what we get here.

Okay. Now, our data is ready and we can move one with estimations of the model.

The first calculation shows you how this can be done in plain numpy.

We define an augmented data metrics by adding a column of ones at the left here,

and then apply the normal equation here.

Okay. So, here is the results, so far so good.

So, let's do the same thing in SKlearn now.

Instead of explicitly coding up the normal equation here,

now we can simply code the fit method of the linear regression glessine SKlearn.

Let's run it.

And here is the result.

As you see it's identical to the result we got in numpy

and this is because Scikit-learn uses numpy under the hoods.

Now, let's see how the same can be done with tensorflow.

Here, we first again,

make an augmented data matrix that we will call X_np.

And then we create two tensorflow nodes X and Y,

which will be tensorflow constants that will be initialized to

X_7P and Y_train at runtime.

Then we implement the normal equation in tensorflow.

This time using tensorflow matrix iterations.

Then, we announce our session to compute the values of theta and here are the results.

Now, let's make a tensorflow class for linear regression that implements

both the normal equation and

another method called Maximum likelihood estimation or MLE for short.

If you are not familiar with the MLE,

don't worry we will cover it later this week.

So, you will be able to return to this demo,

if you don't understand the formulas that I

will be implementing in tensorflow for the MLE method.

Okay. So, what do we have in this class?

Let's look at the class constructor.

The first thing that needs an explanation here are

these two attributes, self.X and self.Y.

They are declared as placeholders.

Placeholders are special nodes in the tensorflow graph that to not

to any computation but simply pass the input data at runtime.

At this stage of creating a graph,

tensorflow needs to know what type of a tensor it should create for these data.

So, the first argument of it will be a type of

these data and the second argument will be the shape of a tensor representing this data.

Here, the shape of a tensor to fill the placeholder node solve

the text as shown as none and n features.

This means that the second dimension,

that is the number of columns in the data sample

will be n features but the first dimension,

the number of points in the data sample can be anything.

The next two lines implement the normal equation,

as we just saw above and the mean square error.

Let's move to the second method implemented in this class.

With the MLE methods,

we estimate all parameters of regression plus the noise variance.

This makes n features plus two parameters to estimate.

Therefore we create a variable

self.weights that will store the result of our MLE estimation.

The next few lines compute

the negative log likelihoods function for linear regression.

Please pay attention to a little trick I used here to define

the standard deviation or volatility sigma as a square of the last weight.

Plus some small positive number.

This is made to ensure that the log likelihood defined here,

stays well defined for all values of weights.

Finally, the last line in the constructor defines

a node that specifies and optimization method that will be performed [inaudible].

Tensorflow has a number of built-in optimisers.

And we will use the one called the Adam optimizer.

Another choice could be gradient descent optimization which is commanded out here.

We will talk later about how these algorithms work in theory,

but now, we are looking at how they work in practice for these simple example.

The rest of the class is a function that generates

the synthetic data which I copied here

inside the class from the above cell for convenience.

Now, the main function here creates the data first,

and then creates the train test data sets.

Then it creates our linear regression model here and starts the tensorflow session.

Now, please pay attention here to how we run the graph.

To compute nodes defined in our class,

we need to fill the placeholders that we created there.

And this is done at runtime using this dictionary called feed dict,

whose keys are placeholder's nodes and the values are model inputs.

Such calculation is done to compute both their optimal values of parameters,

and the train, and test errors.

Please note that in the last two calculations we compute the same node

but filled their placeholders differently each time with different data.

This is for the normal equation implementation within the class.

The next lines showed the training for the MLE method.

Here we're on 1,000 steps of optimization that will

keep updating model weights according to

the minimisation scheme that we defined in the class.

This code returns both the current value of the loss and updated model weights.

And after the train is done here,

we test the model.

After that we compute the model predictions for the noise volatility and

print the results as well as show

a three dimensional projection of our prediction results.

Let's execute this cell and execute the next one to see the results.

And here are the results.

As you can see both methods produce

very similar results and both provide a good feed to the data.

This notebook contains below one more implementation of linear regression this time using

a different algorithm codes stochastic gradient descent and

implementing linear regression as a single neuron neural network.

All these will be topics for our next videos.

You can come back to this stone book later to see how they can be implemented,

after you get familiar with these topics.

For now, let me stop with this notebook at this point.

Okay. Now, after we saw the working of linear regression in tensorflow

and use both normal equations solution and the Maximum likelihood solution,

we are ready to move on with regression problems in Machine Learning.

In the next video we will look at how

regression problems are solved using neural networks.