0:00

I'm going to talk about simulation in this lecture.

Â Simulation's a very important topic for statistics and

Â for a number of other applications, so I

Â just want to introduce some of the functions in

Â R that can be useful for doing simulation.

Â So, there are a couple of functions that are available for simulating numbers or

Â variables from given probability distributions, probably the

Â most important of which is the normal distribution.

Â And so we can generate variates

Â from the normal distribution by specifying a mean and a

Â standard deviation for that distribution and then calling the rnorm function.

Â So the rnorm function will simulate normal random variables that

Â from a distribution has a given mean and standard deviation.

Â So the, there's a cor, there are

Â corresponding functions for the R, for the normal

Â distribution that can be used to evaluate

Â the probability density, to evaluate the cumulative distribution

Â function and for also for evaluating the quantile function.

Â So, another function for generating random variables is the rpoirs function or the,

Â which generates Poisson random variables from

Â a Poisson distribution with a given rate.

Â And so, so there are number of functions for generating

Â random variables from the, from kind of the standard probability distributions.

Â And you can use these to do, to run simulations.

Â So, probability distribution functions ha, there

Â are basically four functions associated with them.

Â And so for any given distribution like the

Â normal distribution there will be a function that

Â starts with the d, a function that starts with an r, a p, and a q.

Â So there'll be four different functions for each distribution.

Â So we've ready, I've already mentioned that there's the rnorm function.

Â The rnorm function is for generating the, is for random number generation.

Â There's a dnorm

Â function, which evaluates the density of the probability

Â dist distribution for given mean and standard deviation.

Â There's the pnorm function, which evaluates the cumulative distribution.

Â And there's the qnorm function, which evaluates the quantile function.

Â So every distribution has these four types of functions.

Â So for the gamma distribution, there'll be a

Â dgamma, an rgamma, pgamma, and a qgamma function.

Â And for the Poisson distribution

Â there's the rpoise dpoise ppoise, and qpoise functions.

Â 2:14

So working with the normal distribution re, requires these four functions.

Â So I mentioned there's dnorm, pnorm, qnorm, and, and rnorm, and

Â you can see they each take a number of different parameters.

Â All the functions have required that you specify the mean and

Â the standard deviation, because that's

Â what specifies the actual probability distribution.

Â If you do not specify them, then the default values are a

Â distribution, a standard normal distribution, which

Â has mean zero and standard deviation one.

Â 2:40

For the dnorm function the, you wa you can evaluate the density.

Â And there's an optional, there's a, there's an option

Â that allows you to evaluate the log of the density.

Â Most of the time, when you evaluate the density function for

Â a normal distribution, you're going to want to use the log of that value.

Â But the default is false.

Â For the pnorm function and the qnorm function there's

Â also an option to evaluate it on a log scale.

Â but, but, but another option

Â is to evaluate, is whether or not you

Â want to evaluate the lower tail of the distribution.

Â So the lower tail, which is the default, is the kind, if you think of

Â it, if you look at the probability distribution

Â it's the part that goes to the left.

Â It's the lower tail.

Â If you want to evaluate the upper tail, sometimes you want to do this.

Â Then you want to say lower tail equals false, and

Â that will evaluate the upper tail of the distribution.

Â And finally for rnorm, there's only

Â two parameters, mean and standard deviation, and

Â an n, which is the number of random variables that you want to generate.

Â So if n is 100, you'll get a vector of

Â 100 numbers that are drawn from the, from the normal distribution.

Â So just to be more explicit, if phi

Â is the cumulative distribution function for the standard normal

Â distribution, then pnorm is equal then to phi and

Â qnorm is equal then to the inverse of phi.

Â 3:56

So, just to quickly, if you want to

Â generate some random normal ren, er, variates.

Â You can just rnorm and pass in an integer,

Â which is the number of variables you want to generate.

Â So here I'm passing ten.

Â And you can see that the vector that's produced will be

Â random, normal numbers which have mean zero and standard deviation one.

Â If I wanted to generate a vector that had mean 20 and standard

Â deviation two, I, I just need to specify that explicitly in my call

Â to rnorm.

Â So here, this vector has a, is, are, ten random normal vi, sorry, normal

Â random variables and their mean is roughly 20 and their stand deviation is two.

Â So when you, any time you simulate random numbers wi, from any distribution

Â for any purpose, it's very important that you set the random number generator seed.

Â And this can be done

Â with the set dot seed function.

Â So, what's important to know that on

Â computers when you generate random numbers, the

Â numbers are not actually random but they

Â appear random and that's the important thing.

Â And, if so the idea is that if you wanted to generate the same set of

Â random numbers again, you could if you wanted

Â to because the numbers are not actually random.

Â They're called, they're wit, they're what are called pseudo random numbers.

Â And so here I'm setting the seed to be one.

Â So the seed can be any integer you want.

Â You just pass in an integer, and that's the seed.

Â So here I'm going to set seed equal to one.

Â And then I'm going to generate five ra, random normal random variables.

Â 5:22

And so here I've got my ran, my five normal random variables.

Â They have mean zero and standard deviation one.

Â If I generate another five, you'll see that the vector

Â is totally different, because it's another random sample of five.

Â However if I reset the seed to be one,

Â and I draw five again, you'll see that they're

Â exactly the same as the first five that I drew.

Â So anytime you, so when you set the seed it kind

Â of sets the, the sequence of random variable that's just going to occur.

Â And if you reset the seed, you kind of set the sequence to go back to

Â where you started, and then it will continue

Â to kind of generate random variables from there.

Â so, this is important because it allows for

Â you to reproduce random numbers that you generate.

Â Now that might sound strange, because why would you

Â want to, to re, generate the same random numbers twice?

Â But in many applications you do want to generate the same

Â random numbers twice so that people can reproduce what you've done.

Â 6:14

And particular if there are some errors or

Â problems in what you've done, you want to be

Â able to get, just to kind of go

Â back to the exact situation that produced those problems.

Â So whenever you do a simulation, you always want to

Â set the random number c, so that you can go

Â back and get the same results.

Â 6:32

So I've demonstrated how to generate normal random variables, but

Â of course you can generate

Â random variables for other probability distributions.

Â So the Poisson distribution is of course very popular.

Â Here I'm generating a ten Poisson random variables with the rate of one.

Â And and so of course Poisson data are going to be integer.

Â Here I'm generating a pois, ten Poisson random variables at

Â the rate of two, so you can see they're slightly larger.

Â And then here I'm generating ten

Â random variables Poisson random variables with a, with a rate of 20.

Â And so, so for the Poisson distribution, the mean is going to be equal to the rate.

Â So you can see that roughly in each of these three

Â cases, the mean is roughly equal to the rate that I specified.

Â I could also evaluate the cumulative

Â distribution function for the Poisson distribution.

Â So here I'm in this first example I want to know

Â what is the probability that a Poisson random variable is

Â less than or equal to two if the rate is two.

Â And so this is the probability.

Â It's 0.67 roughly.

Â If I wanted to know what's the probability that, that a Poisson random

Â variable with rate two is less than four, less than or equal to four.

Â You can see the probability's getting bigger.

Â And here I can see the probability that a Poisson random variable is less than six.

Â Less than or equal to six, and it's very close to one.

Â So the cumulative distribution allows you to, to evaluate these probabilities.

Â