So we've described some examples, some very basic examples of random variables. So, what we need is a, a mathematics. Of random variables to use them. And we have a mathematics of probability. And, we've acknowledged we're at least willing to think of. Kinds of variables as if they're random. We'd like to, put those two ideas together. So we need functions that map the rules of probability to random variables, and so for discrete random variables the kind of functions that we are talking about are so called probability mass functions. So probability mass function is simply a function that takes the values that random variable can take and maps it to their associated probabilities. So for a die, the probability of p of one would be one-sixth for example. And it turns out quite a few functions satisfy the definition of being a probability mass function. In fact, you only have to satisfy two rules if you'd like to be a probability mass function. The first rule is that you have to be bigger than zero for all of the arguments, where here, x is the collection of possible values that a random variable can take. And the second rule is that if you sum over all possible values, then you get one. This is just exactly analogous to our probability statement that the probability of the whole sample space has to be one. But here we've put it in the terms of a probability mass function. I want to talk a little bit about this notation. Notice here I have this small x, and when we define random variables, two pages previously, we used the capital X. So this is very common and maybe slightly unfortunate notation, but it is used everywhere, so you might as well get used to it instead of fighting it, that we use an uppercase letter. Typically, to represent the random variable as a conceptual entity. So if we say capital X, we're talking about a die roll that we could have. When we use a lower case x, or a lower case y, or a lower case letter of any sort, we tend to be talking about. Realized values of the random variable. So the X lower case should be something that you should be able to plug a number into where capital X is a conceptual random variable. It's a conceptual flip of a coin, it's a conceptual role of a dye. Lower case X is one or two or three or zero okay and it's slightly unfortunate notation it takes a little bit of getting use to. But I think for everyone who works on statistics of probability they've gotten use to it and everyone does it so you might as well do it too. Let's go over an example of constructing a probability mass function. Let's take the simplest possible example, a coin flip. So let's let x be the result of a coin flip, where zero represents tails and one would represent a head. So we want the function, and let's assume that the coin is fair. So we want a function that maps zero to one-half and one to one-half, and there's infinitely many ways you could write down that function. Well, we're gonna pick one. Here we write it as one-half raised to the power of x. Times one-half raised to the power of one minus x. And notice that if you plug in x equals zero, you get one-half, and if you plug in x equal one, you also get one-half. Now let's go to a slightly more complicated example where we assume that the coin is potentially biased i.e. That it's not fair. So let's let theta be the probability of a head. In this case, expressed as a proportion between zero and one. So, just as an example, imagine if theta was.3 instead of a half, then we would think that the probability of a head was.3 and the probability of a tail is.7, but let's leave it as theta for right now. So we want a function that says probability of a zero is one minus theta and the probability of a one is theta. And then we see our function here, theta to the x, one minus theta to the one minus x exactly satisfies these properties. This is another common notation in the field of statistics like greek letters like theta represent the things we don't know, that we would like to know. So imagine if you had a coin, and you didn't know whether or not it was fair, we would represent that unknown probability of head as theta. So I want to give you a sense of where we were going. In this case the probability of mass function is the entity that governs the population of coin flips. And so, if we want it to know theta, we are gonna collect data to estimate it, and then to evaluate the uncertainty in that estimate. And the way we are going to evaluate uncertainty in that estimate is using this probability distribution. So all the probability distributions we are going to talk about are conceptual models of populations and they are the entities that are going to tie our data to the population. So at any rate, right now this may sound a little heavy, and we'll discuss this in much more detail throughout the entire class, but the one rule I want you to remember right now is that unknown things that we want to know, like in this case, what would be the probability of a head, are generally denoted in Greek letters. These are called parameters usually. I also want to note one other thing. Why is it among all the possible ways that we could have written out this probability mass function did we choose theta to the x, one minus theta to the one minus x? There's lots of different ways we could have done this. You can try and figure some of them out yourself. Well it turns out, and we'll discuss this at length, that in probability, multiplying is very useful. And so we want, probability mass functions that make. Multiplication very easy. So if we take things and raise them to powers, then multiplying becomes easy. And that's a general, rule. And we'll tell, you'll see later on why this is the case. But any rate, this is why we choose this particular form. Of the probability mass function when you could write it so many different ways. But I wanna say, people have. Thought about this a lot, and this is. Definitely the most useful way to write out, this particular probability mass function. So consider again the unfair coin. Our probability mass function satisfies p of zero equals one minus theta and p of one equals theta. Let's just go through the exercise to prove to ourselves that this is in fact a probability mass function. It's greater than zero because it's one minus theta for zero and theta for one and in this case theta is in between theta and one. So, it's going to be greater for zero for x equal zero and one. And then, the sum of the probabilities, probability of zero plus the probability of one, in this case is theta plus one minus theta which is one. So it satisfies the two rules that probability mass functions have to satisfy. So that covers our principle entity that we're going to use to model discrete random variables, probability mass functions. So now we need to cover our principle entity that we're going to use to model continuous random variables, which are called probability density functions. So probability density functions are abbreviated PDF by the way, so it stands for probability density function not portable document format, which is what lots of people think of it as pdf, but in statistics no one thinks of pdfs that way. I want you to remember one very important role and I put it in italics to make it sure everyone remembers it and by the end of the course this will be second nature to you, but if haven't seen it before, it might seem a little odd. But the way that probability density functions work are that areas under probability density functions correspond to probabilities for the random variable. And there's definitely one undisputed king of all PDFs, and that is the so-called bell curve. So if you ever wondered what a bell curve was, if you hear it talked about a lot, the so-called normal density function, you might wonder what in the world is a bell curve accomplishing. Well. Areas under bell curves correspond to probabilities. So if you're modeling something as if the population it belongs to follows a bell curve, then you are saying that, that probabilities associated with that random variable are governed by areas under that bell curve. That's just one example of a pdf. There is a lots of different kinds of pdfs. So just like probability maths functions have to follow two rules, probability density functions have to follow two rules to be a valid probability density function. They have to positive for all the possible values that the random variable can take, that's called a support usually, and their integral has to be one. I would also say a, a small point here. We define probability density functions as if they, operate on the whole. Real line. So even if your, random variable can only take values say between zero and two like we talked about earlier with the pencil experiment. Even if that's the case, we define the probability density function as zero below zero and zero above two so that there's no associated probability, but we've defined the probability on the whole real line so that we define its integral from minus infinity to plus infinity. And I think in this class we tend to be a little bit fuzzy about, sometimes operate on minus infinity to plus infinity, in other times we will just write out zero to two, discarding all the area where the function is zero and I hope from the context it will be clear what we are doing. This final property, property two here that the integral of your whole real line of the probability density function has to be one, is simply again saying that the random variable has to take some value, that it has to be in some interval in the real whole line. Let's go through it specific example of a P D F and let's put it in a context. So let's soon that the time in years from diagnosis until death with a specific kind of cancer follows the density that looks like this. Alpha vacsillesiii each as a negative x of five divided by five for x greater than zero. The greater than zero been contextually clear because you can have negative time from diagnosis and the person is presuminglyiii alive at the time of diagnosis. This is a very restricted example of a density that's commonly used in these sorts of analyses of things like survival times. It's called the exponential density function. And again here you see that we have f(x) written as e to the negative x over five over five for x bigger than zero, and zero otherwise, like I talked about in the previous slide, we often just ditched that zero and talked about f(x) being, the kernel of the function and the just either explicitly write or sometimes we will fudge a little bit that x has to be greater than zero, if it is clear from the context if that has to be the case. In this case it would be clear from the context. Is this a valid density? Could we model survival time after diagnosis with this density? Well first of all we know that the function is positive because e raise to any power is always positive, and then lets just check whether or not it integrates to one. So we want the integral form minus infinity to plus infinity but like we said that all of the meet of the distributions starts at zero and goes from infinity, so lets just say the integral from zero to infinity, f(x) dx is in this case the, anti-derivative is negative e to the negative x over five, which when evaluated from zero to infinity yields one. Let's go through an example, of. Using, this probability density function to assign probabilities. So imagine if, we were to model this population as if it followed this specific. Exponential probability distribution. And imagine if someone asked us the question, "What's the probability that a randomly selected person from this population survived more than six years?" So if X, is the. Conceptual, value. That, a random person takes. We want to know, what's the probability that X is greater than or equal to six? As represented by this, probability statement. Remember again the golden rule for. Probability density functions that areas under the curve correspond to probabilities. So, if we want the probability x is greater than six. We want the integral from sticks to infinity of the probability density function and you can go through the calculus here to get the that works out to be about 30%. In the statistical programming language or you can do this automatically, it just does the integral for you, it uses a numerical approximation and you just write Px6, for the fact that we want the probability of six or larger. One fifth represents this parameter five that you see in the exponential distribution. Lower dot tail equals false, means that we want the probability being larger than six rather than the probability being smaller than six. So lower dot tail equals true, will give you six or smaller, lower dot tail equals false, will give you six or larger. I want to elaborate on that point, by the way. For a continuous random variable, the probability that it takes any specific value is in fact zero. Now that seems strange, but it's true. So remember areas under probability density functions correspond to probabilities. So what's the area of a line? It's zero. Now, you might say, now that doesn't make any sense at all. Specific values have to take probabilities because we see specific values when we actually observe variables. The point is, is that our. Probability density function is a model and it is defined on continuous random variables. Continuous means measured to infinite precision. And so, when we observe things, we never measure them to infinite position, we never measure them to finite position. And probability density functions are perfectly happy with saying, the probability that x is 6.01 to 5.99 in assigning a perfectly valid probability to that. But the probability that is exactly six is zero. Because remember exactly six means 6.0 followed by an infinite trail of 0s, or 5.99 followed by an infinite trail of 9s. Either way, that's the idea behind what probability density functions are getting at. They're modeling truly continuous random variables. So just remember that, when we observe data. We of course measure them with finite precision, but. Our, continuous. Model is exactly that, it's a model. We find it far more useful in many circumstances, to model random variables as if they were truly continuous. Than to account for all the potential specific values they could take. So, in this specific example a, a person will only measure how long they survive to the year. Maybe to the month, maybe to the day. Maybe to the hour, to the minute. To the second, but probably not much further than that. And so, we're only going to measure to finite precision. Nonetheless, it's still is much more useful to model that as if it was continuous because we don't want to have to assign probabilities to every single value. We want to assign a general function. And that's why. Continuous random variables are so intrinsically useful. So my, the belabored point I'm trying to make. This, by the way, is that whether or not you write probability x being greater than or equal to six. Or the probability of x being strictly greater than six in this case doesn't change this calculation whatsoever. You get.301 either way. And so it also doesn't make a difference in the probability exponential for the, our example. It doesn't matter. Whether you specify lower tail or upper tail, whether you're thinking about whether or not that includes six, it doesn't care about that. However, for discrete random variables, it makes a big difference, right? Because specific values. Have actual probabilities assigned to it, so a die can take the value one, two, three, four, five, or six. So in R if you are using these probability functions, so Px are probabilities from the exponential distribution, P binomial are probability from binomial distribution, P Poisson or Pois for Poisson is probability from the Poisson distribution, P gamma probabilities from the gamma distribution, are follows that rule pretty neatly. If it's a discrete random variable, you have to be careful about whether or not it's including the six. For a continuous random variable, you can be very sloppy about it. So here I'm just depicting the area that we're calculating. This grey area is the survival time from six to infinity. This is simply the integral that we're actually calculating, and I'll put the R code to generate exactly this figure in the files for the course.