So, in that specific example, we looked at what's the probability that the random variable was larger than six. But we might want to look what's the probability the random variable's larger than seven, or smaller than six, or smaller than five, or smaller than 4.3, or so on. So if you take a random variable, you could construct a function that, when you plug in a value. Returns the probability that the random variable is less than that value. And you could construct a function that when you plug in a value, returns the probability that the random variable is larger than that value. And these things are so inherently useful that we give them names. So the cumulative distribution function, CDF, is simply a function that takes any specific value and returns the probability that the random variable is. Less than that value. And again, if it's continuous, it doesn't matter whether it's less than and equal to or less than. But, the cumulative distribution function is defined for both continuous and discrete random variables, so let's be specific and say that it's less than or equal to. The survival function is the opposite, namely, that it is exactly the probability that a random variable is larger than any specific value. So if you plug in x into the survival function, it returns the probability that the random variable is larger than x. So if, on our previous slide, this figure, imagine if on the horizontal axis, instead of looking at sect, I was looking for some arbitrary value x, The gray area would be s of x and the white area to the left of x and to the vertical axis would be f of x. Notice in this case that f is the probability of being less than or equal to x, s is the probability of being strictly greater than x, so that s of x and f of x have to add up to one. Because their probabilities have complimentary events. Probability X is less than or equal to X and probability X is strictly greater than X. So if you've calculated the cumulative distribution function you've also calculated the survival function because all you have to do is one minus it, conversely if you've calculated the survival function then you've calculated the cumulative distribution function. Next we'll just go through our previous example and calculate exactly the survival function and the CDF. Let's actually go through an example of calculating the survival function and cumulative distribution function, just from the exponential density that we considered before. Let's calculate the survival function first. So recall the survival function is the probability that the random variable's strictly greater than the value lowercase x. So further recall that to calculate probabilities, we need to calculate areas under the probability density function. In this case we want the probability being x or larger. So let's take the integral from x to infinity of a probability density function. Here I used the dummy variable of t for integration. The integral, or the anti-derivative, of e to the -t/5/5 is just -e to the -t/5. We want to evaluate that from x, which yields the value e to the -x/5, and then subtract off the value as it limits to infinity, which is zero, so we just wind up with e to the -x/5. Now we could also go through the example of calculating the cumulative distribution function which would instead of calculating the, integral from x to infinity, we would just be calculating the integral from zero up to x, but because we have already calculated the survival function, we know that its one minus the survival function, so its 1-e to -x over five. So the cumulative distribution function is the integral from minus infinity to x of the probability density function. Again, here, we're just doing zero because the integral from minus infinity to zero is zero. And, so. We can just apply the fundamental theorem of calculus. And, note that the derivative. Of the, CDF is exactly the density again. So, if we take. Just, to go through our specific example. One Minus E to the, negative X over five and take the derivative of that. We get exactly E to the negative X over five divided by five. So we get the, PDF back. So, derivatives of the cumulative distribution function. Exactly yield, the. Probability density function back. Quantiles are properties of distributions or density functions equivalently. When I talk about the distribution or density in general, just maybe say the word distribution, so if I want to talk about the Bell curve or the associated distribution, I will just talk about Gaussian distribution or the normal distribution and so on. And then we are talking about the mathematics, I will be more specific. The alpha quantile distribution. Is the point. So that the probability of being less than that point is exactly alpha. So we want. If Xed alpha, is the alphath quantile redistribution, we want the probability of being less than or equal to Xed alpha to be exactly alpha. So lets just take as a specific example, if alpha was 0.25, X sub 0.25, is that point such that the probability of being less than it is 25%. So for example, in our cancer survival example, the 0.25th quantile of that distribution. Is the time of survival so that 25 percent of the people survive less than that time. The percentile is merely the quantile expressed as the percent, so the twenty-fifth percentile is the point two fifth quantile. And then the median, the population median, is exactly the fiftieth percentile. Let's, just go through these concepts again. With our, density that we've been looking at. This exponential density. Suppose, we wanted to find the twenty-fifth percentile of the exponential survival distribution. What we want is to find the point X, on the horizontal axis. So that the white area to the left of it, is point two five. So let's actually go through this calculation. In order to find the point so that the area to the left of it is point two five, we just want to solve the equation point two five equals f of x. We'll recall in this problem a couple of slides ago we solved that f of x is one minus e to the negative x over five. And, if you just simply solved that for x, we wind up with the solution x equals minus log point seven five times five, which is about one point four, four. How is that one point four, four interpreted? About 25 percent of the subjects from this population live less than, 1.44 Years. You can get quantiles directly from R by the Q function, Qx in this case, because we are talking about the exponential PDF. So Qx gives you the quantiles from the exponential PDF, Px gives you survival or CDF properties from the exponential, and Dx gives you the density itself and that rule R follows for most of the common distributions. The median, to remind you, is the point fifth quantile, the fiftieth percentile. And quantiles that we just figured out the point to fifth is generally called the lower quartile and you might have said, oh I have heard of the median before. Maybe I've heard of what a lower quartile is before and what those things are to me is that they are the middle of the data or point in the data so that 50 percent of the observations are lower than it or the lower quartile is that point in the data so that 25 percent of the observations are below it. What in the world is Bryan talking about at this point. So when we talk about the median, that we are discussing in this lecture, we are talking about the population median. And when you collect data and take a sample median, that's a estimate of something, so we should talk about what that's an estimator of. Right, it's an estimator and it has to have an estimand. In the same way, if we take a sample mean of data, that's an estimator of something and it has to have an estimand. And so what, what we're talking about in this lecture is one way to construct estimands of these quantities. In this case, the median, if you take the sample median, it is hopefully trying to estimate the population median, that point in the population so that the probability of being less than it is 50%. And you'll find in this class that there's this simple rule. Sample things tend to estimate population things. So sample medians estimate population medians. Sample variances estimate population variances. Sample means estimate population means, and so on. And what we are going to see is this probability modelling and the associated assumptions. These are the things that connect our data to the population so that we can actually have s demands. If we didn't go through this exercise, we would be able to take a median, you know, that would just be an entity in a sample. The whole point of probability in modelling. Cuz then it connects your sample to this population so that your, now your, sample median now has a population median that it's trying to estimate. Now in, this is kind of a very difficult concept. I think the sample median is a very easy concept. It's saying, you know, you have a list of observations, take the middle one after you order them. The population median is a much more difficult concept. It's saying I have described a population via this distribution. And this distribution has a point so that 50 percent of observations lie below it, and that's the population median. And, I think it's a good idea. Whenever you're talking, in this class. To put the word population or sample in front of it to remind yourself. Now, people who work in statistics do this so much. That they just kind of. Forget about these distinctions. Even though they know them, they just forget about them because they become sort of second nature. But when you're first learning this, it seems quite odd. And I also wanna, mention. You know the sample median is the well defined quantity that doesn't, require tons of assumptions. It's the probability modelling that's the delicate part. That requires assumption. So if you are going to say that the sample median estimates the population median, whether assumptions need to be taken for the account for that to be true. And specially, when you want to do inference with your sample median or evaluated uncertainty. And that's basically we are going to spend nearly all of this class discussing is how we are going to connect these probability and population concepts to sample data. Thanks recruits. This was mathematical statistics boot camp lecture two. In the next lecture, we're going to expand on probability modeling and defining characteristics of probabilities, and I look forward to seeing you.