Okay, so having this notion of this model complexity or regularization in the previous lecture. What we're going to do in this lecture is extend our sentiment analysis code base that we developed previously to incorporate this idea of regularization or to penalize model complexity. And we'll also use this lecture to demonstrate some of the model performance measures we previously covered. Okay, so the first thing we'd like to do is to take the code base we'd previously developed, the sentiment analysis, and extend it to incorporate a regularizer. So what's that going to look like? Well, on left hand side of this equation here, we have our regular mean squared error. We've been optimizing this whole time whenever we train these least squares regression models. All we're going to do is extend that to incorporate a time which penalizes model complexity. So that's going to be our time on the right. So this penalty time is just saying, what is the sum of squared values of our model coefficients data? So this is one of the complexity notions we introduced in the previous lecture that would say we would maybe like to favor models that have all times being approximately zero rather than any particularly large coefficients. And in the case of sentiment analysis example we're saying maybe we have some words that are very rarely observed. We don't want to assign extremely large coefficients to those words. Rather, we'd like to be more conservative, choose a simpler model that tends to assign relatively small coefficients to each word. And finally, we have this parameter, lambda in the middle, which is going to trade off these two notions of model accuracy versus model complexity. Were just sort of saying how much complexity can we afford in order to get the accuracy that we would like? And we'll set that trade-off parameter later on in future lectures when we talk about notions like training, validation, and test sets. Okay, but for now how do we go about incorporating this complexity term into our model? It's a very common idea to do this, and it's actually implemented by one of the linear models in sklearn called Ridge regression. So all that I'm doing here is looking at the help on the Ridge regression function, and we can see if your interesting things here. So this is linear least squares with l2 regularization, as written down on the top there. So it's basically exactly the model we've been working with so far, which is linear least squares, minimizing the means squared error. But it has it's l2 regularization component. So it's going to be minimizing or optimizing exactly the same expression I had on the previous slide. Okay, and it also take this parameter called alpha, which is exactly the parameter lambda in my previous expression, it's just a difference of notation. Okay, so we're going to run one of these ridge regression models. For the moment we'll just set that value of alpha or lambda to 1.0, but later on we can change that value and see what effect it has on the model's performance. Okay, so we set that regularization strength parameter, and then we train our model just like we always do. And again, we'll extract some parameters from the model, and maybe because we've penalized complexity a little bit, those parameters will be slightly different than before. At least if we penalize complexity by our math. So this is the same block of code I had from a previous lecture, but this time I'm looking at the coefficients of my regularized model. So perhaps the coefficient of disappointing or disappointed is not as extreme as it was before. Okay, so that's how we've incorporated a regularizer into our model, which we'll explore more later on. What we can also do with our model is to look at some of those evaluation measures we've been introducing previously. Okay, so starting with the mean squared error and R squared statistic, we can run this model.predict function to get a vector of all the predictions that the model would make. Which we can now compare to the labels y. So in the second code block, block number 35 there, we're just building this vector of squared differences between all of the model's predictions and the labels. And then in code block 32, sorry, in code block 36, we're computing the mean squared error. So we're summing up all the squared differences and dividing by the length. And finally, we can compute the R squared statistic or the fraction of variance unexplained. Just by normalizing this quantity according to the variance of the labels, which we can use the numpy variance function to compute. Okay, and that gives us some R-squared value, which is a value between zero and one that measures how much variance is explained or not by our model. All right, so those are a few of the regression evaluation measures we introduced, so the mean squared error and the R squared statistic. We can also modify our cobase a little bit so we could classify our evaluation measures. So in order to do so I'm going to change our code base a little bit [INAUDIBLE] estimating ratings, which is a regression problem. We'll estimate whether a rating is greater than three, which we can convert then to a classification problem. It's a binary outcome. So to do so, we just build a vector of labels now. So our y for classification is just a measurement, true or false, of whether a particular rating is greater than three. All of the ratings in our dataset and now we can run regular old risk regression on this data and fit it as before. Okay, so now we have some classification model, we can run the same prediction function on it and we can say well, first of all, what's the accuracy? What fraction of that model's prediction's actually correct? So first of all, here we're building just this list of true false values that's essentially storing the correspondents between the true label's y class, and the predictions our model made. So it's going to be values of true and false indicating which instance we got correct or incorrect. And that will compute the accuracy, we just say what fraction of our predictions were actually correct. Okay, next, we can look at true positives, true negatives, true positive rates, true negative rates, etc. So first, we compute those four values we introduced previously, the true positives, the false positives, the true negatives, and the false negatives. So we can do each of those just for the list comprehension, where we just iterate through the predictions and the labels. And we say what is the relationship between the prediction and the label? So we have true positives when the prediction is true and the label is true, false positives when the prediction is true and the label is false, and so on and so forth. And now we can print out all those accounts. Just for sanity's sake you should total check and confirm that if we sum up the true positives, the false positives, the true negatives, and the false negatives it should add up to the size of the entire data set. So next, we can compute related statistics, like the accuracy, just to confirm that this is the accuracy we got before. We can also compute things like the true positive rate and the true negative rate. So true positive rate is the number of true positives divided by everything labeled positive, so the true positives plus the false negatives etc, or the balanced error rate. So we can take one minus a half times actually positive rate plus a true negative rate, gives us a balanced error right here of 0.23. Okay, so that's about it. In this lecture we've just introduced some code that allows us to incorporate a regularizer and time model, and we've also computed various evaluation measures for regression. We've looked at the mean squared error and R-squared statistic. And for classification we've looked at accuracy measures like the true positive rate, true negative rate, and balanced rate. So on your own I would suggest taking this code and adapting it to incorporate other evaluation errors. That can be an absolute error or trying to compute the precision and the recall.