[MUSIC] Let's talk about how to split continuous attributes to make a decision with a reasonable number of branches. All right. So one that you can split on every unique value but given a measurement value as a floating point number, it's unlikely for there to be too much grouping in which case you'll have an individual branch for every possible value in the data set. Every possible fork in the data set. Which gives you no information gain, or sorry, gives you a lot of information gain. But doesn't actually help that much. It sort of instantly over fits. So you want to sort of bucket these things. So how can you do this? Imagine with this weather data set, instead of having humidity labels of normal and high, we actually had the percent humidity as a number. And so one way to bucket this might be that we're gonna say between 60 and 70 is one bucket, and between 70 and 80 is another bucket, and so on. But that requires some domain knowledge, it ends up being sort of an extra decision you have to make when you're modeling this data. And we're trying to avoid decisions when we're applying these machine learning techniques. We don't want parameters in the system unless we have to. What's a way to do this sort of algorithmically. Well, you can imagine that If we sort the data set in increasing order of humidity, point at the wrong screen, that. Every point in this humidity data set represents a possible place to split it into two choices. High and low. What we can do is just evaluate each one of those choices and measure the information gain. And choose the point which gives us the highest information gain. Here's what that might look like. I haven't actually checked. Every single one, but I sort of eyeballed it and I saw that wow, the first several have solid yes's all the way up to here, so this might be a pretty good candidate to split on. Because on one branch, we know we're gonna have nothing but yes's, and that's what we're looking for here. We're looking for reducing unpredictability, all right, information gain. And another good place to split, perhaps, is down here, because there's a lot of yes's, all but one above it. And no's above this, which sort of makes sense, because as the humidity goes up we're less likely to play outside. So, fine. If you go ahead and compute the entropy here, here is six out of six are all yes, which means it's zero entropy, totally predictable. And here it's three out of eight plus five out of eight entropy, is zero point nine five. And here nine out of ten or one out of ten, that means It's a lower number as we'd expect because it's pretty predictable, right. 90 percent chance of being a yes. And here four out of four is zero point is zero. And so now you do the expectation and say eight out of 14 cases times zero point nine five plus six out of 14 cases of zero is a pretty low number zero point five four, but over here it's an even lower number. And so the right place to split is here, and I didn't check the others, but clearly these are the only places that'll matter because it didn't change in between there, which actually suggests an optimization. Again, we consider every possible binary partition and choose the partition with the highest gain is the way to turning any continuous variable into a discreet variable so you can on this. And you can actually do this more with, instead of just true false or instead of just high low, you can also pick more buckets to have even a higher fan out. [MUSIC]