Welcome to this module of Data Science for Business Innovation. When you need to introduce a product on the market, you need to correctly estimate its price. For instance, are we able to give the right price to a diamond based on its features? Let's start by plotting a Cartesian space where we put on the axes the price and the number of carats of diamonds. We then position some points representing exemplary diamonds in the space. Here, we have some low price diamonds of one carat. Up here, we have some more expensive diamonds of three carats. Now, I have a question for you. What is the price of the two carat diamond? One thousand dollars? Six hundred? Four hundred? Think about it. Well, I guess that you did something like this. You eyeball the line here, and then you said if it is two carats, than it should be 600. This line is called, line of best fit, and the method to determine it is called linear regression. An extremely useful machine learning methodology that comes from statistics. Of course computers cannot eyeball lines, but they are excellent in repeating fast simple procedures. So let's put again those diamonds in the previous Cartesian space, and now let's try to understand how a computer can perform linear regression to find the line of best-fit. First of all, the computer randomly draws a line in the space. Let's say this one. Then it measures the error. So there is this first error, this second error, this third, fourth, fifth and sixth error. Then it explores the space of possible actions that can be performed over the line to reduce the error. It can turn the line clockwise, anticlockwise, it can shift it up, it can shift it down. What it does is following a general procedure called gradient descent. It will explore all the possible actions and do a move towards reducing the error. This is like descending from a mountain. Let's call it Mount Errorest. After estimating an initial error, the computer checks the action will get it as fast as possible downhill. So let's put the points again, and let's say that if we rotate the line of a 120 degrees anticlockwise we have something like this. Now, let's measure the error again. The first error diminishes a lot. The second error now is almost nothing. The third diminishes, the fourth diminishes, the fifth is very small and the sixth diminished. So we descended a bit downhill. Now, what's the best action? Well, probably is to shift up the line a bit. That will reduce all the errors and it will bring us downhill a bit more. When the error becomes stable and no action can reduce the error significantly, we found the line of best fit. So far, we illustrated how to implement linear regression using gradient descent. You may think that linear regression has limited applicability because only a few problems are linear. But actually, linear regression works also for non-linear problems. For instance, let's say that we have these points here. The line of best fit this time is a quadratic behavior that is represented by a parabolic function. You can of course try to fit a quadratic curve or what you can do is transform the axis of the space. Instead of putting your points in an x-y space, you put your points in the x square, y space. Hence, the trend becomes a linear problem. This works also for more complex situations. Let's say for instance that we have this circle of points here. If we bring it to an x square, y square space, we got a line again and therefore you can use linear regression. Summarizing, if linear regression doesn't work, what you can do is transform the space where the points lie in a way that the problem becomes linear.