[MUSIC] Okay, so in order to make a decision about whether a node in the decision tree is helpful or not, which I just claimed that we had an example where one wasn't helpful. We need this notion of entropy. Okay? And a way to think about this, is consider two different sequences of coin flips. We wanna figure out how much information do we get after flipping the coin in each sequence once. So the reason I'm talking about two different sequences is we want some kind of relationship between these two different coin flips. What we want to be true is that the probability of it coming up heads on the one hand, and the probability of it coming up heads on the other hand With these two coins. We know that if those events are independent, which we assume they are assuming everything's fair with these coins, then you can multiply those probabilities together. And so if we're trying to make up some new function called information, then we want information applied to the joint probability of those two events to be the same as the information gained from one of the events plus the information gained from the other event. Okay? So, let me just say that again. We're doing the experiment with two different coins simultaneously, and we're trying to design a function that captures our intuition about what information is. Okay, and it must have this property, right? Any sensible definition of this information function must have the property that if I have two different separate events I should be able to add their information together to get my total. And that should be the same value as applying the information function to the probability that they both happened. Okay, so with a fair coin flip, it's 0.5 times 0.5, so information of 0.25 oughta be the same as information of 0.5 plus the information of 0.5. Okay, and it's here in the slide the function that gives you that is log base 2. Okay, fine. So that's the information. Now what's entropy? Well entropy measures the amount of unpredictability in a set of events, a sequence of events. And so this is the expected value of the information, okay. An expected value. Whenever you see expected value you should always be thinking about I'm gonna be multiplying a probability by a value gained if that event occurs. Okay. And so, here you're saying and then adding up those eventualities. So, that's expected value in every circumstance is going to be multiplying probabilities by values, and so here is no different. So we take I paused because I see that there's a bug on the slide, which is this is negative, because log of a probability, probability is less than 1, so that's always gonna produce a negative number. And we want information to be a positive number, so we need to reverse the sign. And it's reflected down here, but it wasn't reflected in that definition. Okay. So, we understand why information is the log function and we understand that we want the expected value from a full sequence of events. So, the formula falls out of it. It's the probability multiplied by the log of that same probability. And then add all those up, okay? And so this guy comes up a lot. We're gonna use it in one particular way, but you should keep an eye out for it. Whenever you see some values from 0 to 1 multiplied by the logarithm of that same value you're talking about entropy. [MUSIC]