[MUSIC] In all of our modelling throughout the last few lessons, we have assumed that all the observations were independent. Sometimes, that's obviously not true about our data. There is often a natural grouping to our data points which leads us to believe that some observation pairs should be more similar to each other than to others. Let's look at an example. In the previous course, we talked about using a Poisson model for counting chocolate chips in cookies. Let's suppose that you own a company that produces chocolate chip cookies. In an experiment, you're going to produce 150 test cookies. Let's say you'll do 30 from your location and then 30 from four other factory locations. So let's write this as 30 from each of 5 locations. We're going to assume that all the locations use the same recipe when they make their chocolate chip cookies. So now, would you expect a cookie from your location, location one, to be more similar to another cookie from your batch than to a cookie from another locations' batch, I'd say probably. There's a natural grouping to the cookies. We can account for the likely differences between cookies in your Poisson model by making it a hierarchical model. The original fully independent mode for model for Poisson, the number of chips in the cookies would have looked like this. Let's call this the fully independent model. For cookie i the number of chips, given lambda, the mean would be iid from a Poisson distribution with mean lambda. For i = 1 up to n which in our case is a 150. One way we can acknowledge the grouping in the counts is to assign a different lambda parameter for each location. So, we're going to do a location dependent model. We'll say that the ith cookie which happens to be in the lth location follows a Poisson with a mean lambda that depends on the location of cookie i. So that is, yi given the location of where cookie i was made. And lambda li will come independently from a Poisson lambda li, where li can take on values between 1 and 5 because there are five locations, and i = 1 up to 150. That is, if cookie i comes from location two, for example then li will be 2. And so the expected number of chips in cookie i will be lambda 2. So far, this looks like a one way ANOVA model. What make this a hierarchical model is that instead of placing an independent prior for each of the lambda parameters, we're going to assume that they come from a common distribution with hyper parameters that we're going to estimate as well. That is, we're going to say that lambda l, given our hyper parameters, alpha and beta, will be iid from a gamma distribution with shape alpha and rate beta. This is for location 1 up to location 5. We'll complete this model with priors for alpha and beta. So we'll say, alpha comes from the prior for alpha. Beta comes from the prior distribution for beta. It's also useful to see the graphical representation of this model. We're going to start at the top of the hierarchy with the parameters that are independent. So we have a note for alpha and a note for beta. And depended on these two parameters, we have all of the lambdas. So, lambda 1, lambda 2, up to lambda 5. And of course the distribution for each of these lambdas depends on both alpha and beta. In the next level down, we have all of the cookies that are from location 1. So I'm going to say, yi for i such that location of i is equal to 1. So these are all the cookies from location 1. We can do the same thing for all of the other batches, i such that the location that i is equal to 2 and a plate for this group over here, Such that location i = 5. The lowest level here represents each individual cookie. And cookies are grouped by location, which is represented here at the second level. For each location, we have a parameter for the mean number of chocolate chips. These five means come from a common distribution, who's parameters we can also estimate, because the five lambdas can be thought of as data at this level. If we hadn't used the hierarchical model for the cookie example here, two obvious modeling choices would have been to first, fit a single model to all of the data, treating all 150 cookies as though they're independent like we did here in number 1. Or second, fit five separate Poisson models, one for each location. The book Bayesian Data Analysis by Galman and co-authors, which is listed as one of your references for further reading, describes the trade offs, between these types of options, and how hierarchical models provide a good compromise. For example, if we had fit the original model with 150 independent cookies, we would be ignoring potential differences between locations and the fact that cookies from the same location are likely to be more similar to each other. On the other hand, if we had fit five separate Poisson models, one for each batch of cookies, we would potentially be ignoring information that could help us estimate the lambda parameter for your location. Because separate models would ignore data from other locations. Since all locations use the same recipe, it seems reasonable that information about another location's cookies might provide information about your cookies, at least indirectly. In this sense, hierarchical models share information, or borrow strength, from all the data. That is, your lambda is not only estimated directly from your 30 cookies, but also indirectly from the other 120 cookies leveraging this hierarchical structure. Being able to account for relationships in the data while estimating everything with the single model is a primary advantage of using hierarchical models. [MUSIC]