I thought we will end this module with something which is fun and something which is different from what they have done. I'm sure many of you have been on eBay. So that's the modeling we're going to talk about now, okay? So when you are bidding in an auction, your outcome is, or the response is, whether you win it or not. When somebody's saying it'll rain, the response is going to be either it rains and doesn't rain. So for example, somebody says a 70 percent chance of rain, what does it mean? Does it mean there'll be a 70 percent rain? No, you toss a coin and say, "Okay, it rains," 70 percent of the time and if, otherwise, it doesn't rain. So similarly, if you look at customer and you're trying to figure out whether they discussed, well, Renee, and switch over from one, say, mobile phone, telephone carrier to another, to zero or one. So in all these situations, can we use regression? So we already seen that we can use regression when the explanatory variable to predict the variables are categorical, right? The fuel type exam. In fact, metallic color was a binary variable. I glossed over it. So that was not a big problem, but the issue comes up when the response variable is binary. Given example, I collect clocks, and my friends always ask me how many clocks I have. You don't ask or collect that how many clocks you have. You say, "What do you want next?" That's what I do. So before I start bidding on a clock, I start watching these auctions, and then see what price do they win the auction. So it's not necessary. The other problem is I don't, to avoid overbidding, I put in a price, maybe 12 hours before the auction closes and go away. Otherwise, it's not good for your blood pressure. You just put a number there and then you go away, and you come back next day morning and see if you won the the auction. So the game I play is watch an auction, several auctions and, say, 12 hours before, the auction is closing, pull in a number and go to sleep. That's good for you. So recently, I bought a clock, and this clock is amazing clock. If you heard about Atmos clocks, is a clock that work by sheer change in pressure and temperature. So you know it needs a key, and this idea was actually patented, and only one company has it. If you wanted to go and search in eBay, you'll find these clocks. So for a long time, I wanted them, and I've started looking at them and said, ''What variables predict whether I'll win or not?'' I'm dumping up these numbers so that you don't use these numbers to bid, and actually I bought one of these clocks, but these numbers are slightly distorted. MSRP is the actual retail price of this clock when it is new, the price is what I'm bidding, and these are used clocks or clocks. The year is year of manufacture, and actually they have very old clocks also, and they are more costly than the older they are. The three models, for example, there are more than three, but I just made up three of them; model 528, 526, and the baby, babies as little one. Then clocks can be serviced, or they need not to be serviced. So one, if the service, no, that is not, number of bidders. So at the time, I'm going to bid, how many people are already bidding for the clock? Finally, would that bid have won the auction? Yes or no? So actually, I can do the experiment first without even bidding because 12 hours before, I can say, "Okay, I'm going to bid 1,275," wake up in the morning and see if 1,275 won. That's the thing I would do, learn and eventually bid, so that I don't bid too much, but I do get a clock. So this is the data I collect. Here's the data, and it says, on the x-axis is a price bid. On the y-axis is whether or not, if I didn't win, zero; if I won it, one. As you can see clearly, that you can fit a line through these points. What might work really, is instead of a line, I fit something better than a line, something which looks like that, right? It starts somewhere at zero and then smoothly goes up and goes to one. This kind of an S-shaped curve, it will be better. What do I mean by the S-shaped curve? This curve will be the probability I will win the auction. That's the idea. So the idea is to extend linear regression, and we want to look at binary classifications. We wanted to use a structured model, a model with a and x, and b and y, and things like that to predict. So this is the modeling stuff. So we come up with a linear equation as usual which is b0 plus b1 x1 plus b2 x2, while x1, x2, x3 and the different features of the clock. One of the features could be the price. Then we say the probability of winning to get that S-shaped is this function, exponential f of x divided by 1 plus exponential f of x. In this model, exponential means e raised to the power of the function, that is your 2.718. So in this function, the value exponential f of x is equal to a ratio of two quantities. P is a probability you'll win, and one minus p is the probability you'll not win. This ratio in statistics is called the odds ratio. What is the odds of your winning versus not winning. So what we say as the log odds, because if you take the log of exponential f of x, you'll get f of x. We said the log odds is linear, and so this is called a log odds model, which is linear, and it's very popular. It's called the logistic function. One other point I have to make as, that the method of fitting this data is no longer least square regression, so it's going to be slightly different in the outward. Don't bother much about it. The whole idea is to win the bid. It uses a methodology, if you really want to know, is called maximum likelihood estimation. Instead of the eBay values, it is going to give you z values for the coefficients. So same thing, we will interpret them. It also gives you a p-value which is the same, which is, if the p-value is small or the z-value is large, the coefficient is significant, otherwise, it's not [inaudible].