[MUSIC] Before fitting any model we first need to specify all of its components. One convenient way to do this is to write down the hierarchical form of the model. By hierarchy, we mean that the model is specified in steps or in layers. We usually start with the model for the data directly, or the likelihood. For example, let's write, again, the model from the previous lesson. We had the height for person i, given our parameters mu and sigma squared,. So conditional on those parameters, yi came from a normal distribution that was independent and identically distributed, where the normal distribution has mean mu and variance sigma squared. And we're doing this for individuals 1 up to n, which was 15 in this example. The next level that we need is the prior distribution from mu and sigma squared. For now we're going to say that they're independent priors. So that our prior from mu and sigma squared is going to be able to factor Into the product of two independent priors. We can assume independents in the prior and still get dependents in the posterior distribution. In the previous course we learned that the conjugate prior for mu, if we know the value of sigma squared, is a normal distribution, and that the conjugate prior for sigma squared when mu is known is the inverse gamma distribution. Those might seem like natural choices so let's write them down. Let's suppose that our prior distribution for mu is a normal distribution where mean will be mu 0. This is just some number that you're going to fill in here when you decide what the prior should be. Mean mu 0, and less say sigma squared 0 would be the variance of that prior. The prior for sigma squared will be inverse gamma, we'll write inverse gamma like that. And the inverse gamma distribution has two parameters. It has a shape parameter, we're going to call that nu 0, and a scale parameter, we'll call that beta 0. Of course, we need to choose values for these hyper parameters here. But we do now have a complete Bayesian model. We now introduce some new ideas that were not presented in the previous course. Another useful way to write out this model Is using what's called a graphical representation. To write a graphical representation, we're going to do the reverse order, we'll start with the priors and finish with the likelihood. In the graphical representation we draw what are called nodes so this would be a node for mu. The circle means that the this is a random variable that has its own distribution. So mu with its prior will be represented with that. And then we also have sigma squared. The next part of a graphical model is showing the dependence on other variables. So once we have the parameters, we can generate the data, for example. We have y1, y2, and so forth, down to yn. These are also random variables, so we'll create these as nodes. And I'm going to double up the circle here to indicate that these nodes are observed, you see them in the data. So we'll do this for all of the ys here. And to indicate the dependence of the distributions of the ys on mu and sigma squared, we're going to draw arrows. So mu influences the distribution of y for each one of these ys. The same is true for sigma squared, the distribution of each y depends on the distribution of sigma squared. Again, these nodes right here, that are double-circled, mean that they've been observed. If they're shaded, which is the usual case, that also means that they're observed. The arrows indicate the dependence between the random variables and their distributions. Notice that in this hierarchical representation, I wrote the dependence of the distributions also. We can simplify the graphical model by writing exchangeable random variables and I'll define exchangeable later. We're going to write this using a representative of the ys here on what's called the plate. So I'm going to re draw this hierarchical structure, we have mu and sigma squared. And we don't want to have to write all of these notes again. So I'm going to indicate that there are n of them, And I'm just going to draw one representative, yi. And they depend on mu and sigma squared. To write a model like this, we must assume that the ys are exchangeable. That means that the distribution for the ys does not change if we were to switch the index label like the i on the y there. So, if for some reason, we knew that one of the ys was different from the other ys in its distribution, and if we also know which one it is, then we would need to write a separate node for it and not use a plate like we have here. Both the hierarchical and graphical representations show how you could hypothetically simulate data from this model. You start with the variables that don't have any dependence on any other variables. You would simulate those, and then given those draws, you would simulate from the distributions for these other variables further down the chain. This is also how you might simulate from a prior predictive distribution. [MUSIC]