A drawback of direct sole method is that the inputs to the sole building function should be small enough. That is, both X transpose X, and X transpose Y should fit in the driver here. Hence, this method does not work for larger dataset. To get around this problem, we'll implement two iterative algorithm, that's key to a larger data set: Batch Gradient Descent and Conjugate Gradient Method. Batch Gradient Descent is an extremely simple algorithm. Assume that you are somewhere on the hill, and you want to reach to the bottom of the hill. Unfortunately, you don't have a GPS but you have a device that tells you slope at a given point. For higher dimensional surfaces, the slope is called as a gradient. In other word, gradient is just another name for derivative of a function and is a vector that points in the direction of the greatest increase of the function. Therefore, to reach the bottom of the hill, all one has to do is compute the gradient at the current location and take a step in the opposite direction of the gradient. Then, compute the gradient at that location and take a step in the opposite direction of that gradient, and so on and so forth. It will go out quickly, it is represented here. Step one, start with an initial point W. Continue until not converged. Step two, compute gradient DW. Step three, compute step size alpha. Step four, compute new W by subtracting alpha times DW from old W. As shown in the figure, if the surface has multiple local minimum, different initial point can lead as two different minimum. As discussed earlier, the gradient for the least cost function is X transpose X times W minus X transpose Y. To compute step size, one has to perform a line search along the gradient. I will leave the derivation of this expression as homework. The DML script for the batch gradient descent is given here. We initialize the starting point W to zero, then we take exactly 100 steps. In each step, we compute the gradient DW, and the step size alpha using the above formula. Then, we compute the new W by subtracting alpha times DW from old W. Again, as in previous example, we plot the line from the line W. It's given here. Next, we'll look at conjugate gradient method. This method benefits from using conjugacy information during optimization and usually requires far fewer steps. To converge, lets compact two batch gradient descent. The exact algorithm is given here. I'll skip the details of the algorithm and refer you to the key gradient. So, the exact algorithm is given here. And here is the DML script for conjugate gradient method. Like previous method, we'll plot the learned line. If you prefer not to write the custom algorithm, but instead invoke standard of the shelf algorithm, then you can use by ten cord given in example three and example four. In example three, we invoke, we implemented algorithm by using dmlFromResource method and ML context object. The pre-implemented algorithms have a label under script folder in our GitHub. All one has to do is create a script object from dmlFromResource, pass it the input feature and the response variable, that is X and Y using the input method. And like previous example, we'll plot the learned line. Example four is targeted for a scikit-learn user. A scikit-learn user may want to only create a linear regression object and call the fit method. The fit method accepts the input features and response variable as Numpy matrices. That user can simply use our mllearn API, mllearn API allows a Python programmer to invoke systemML'S algorithm using a scikit-learn like API, where the input data can be Numpy Arrays, scifi matrices, all Panda data frame as well as Spark's MLPipeline API where the input data is a Spark data frame. Since these APIs conform to MLPipeline's estimator interface, they can be used in tandem with ML as feature extractors, transformers, scoring, and cross validation classes. So, here's the fit method, predict method, and we'll plot the learned line. Next, we see how to use Keras inside SystemML. There are three different ways to implement a deep learning model in SystemML using the DML bodied ML library, using the experimental Caffe2DML API, and using the experimental Keras2DML API. Keras2DML and Caffe2DML accept the deep planning model expressed think Keras or Caffe former and then underneath genre T because learning DML script. In this example, we train a LeNet network using MNIST dataset. We first load the MNIST dataset, then we create LeNet using Keras API. Keras model given you has two convolution layers with ReLU activation and same padding with MaxPooling layer in between. They are followed by two densely connected layers with dropout. Once we have created the Keras model, we can pass it to SystemML using Keras2DML class. We can then call fit or predict method other than mllearn API.