In fact, the uniform distribution, is a beta one one. And any beta distribution, is conjugate for the Bernoulli distribution. Any beta prior, will give a beta posterior. We call the beta prior, Looks like f of theta is gamma of alpha plus theta over gamma of alpha, gamma of theta times theta to the alpha minus one. One minus theta to the beta minus one over the interval from zero to one. Plus we can look at the posterior for theta, given y. This is proportional to the likelihood. Times the prior, which is theta to the sum of the y's sub i's. One minus theta to the n minus the sum of the y's sub i's. Times this prior. Gamma of alpha plus beta., over gamma of alpha. Gamma of theta. Theta to the alpha minus one, one minus theta to the beta minus one over the interval as theta goes from zero to one. In terms of proportionality, we only need to keep the things that have thetas in them so we can drop this normalizing constant. Collect the theta terms, theta, plus some of the y's of i's, minus one, one minus theta to the beta plus n minus some of the y's of i minus one over the interval from zero to one. We thus see this is the beta distribution. So theta given y follows a beta distribution with parameters alpha plus some of the and beta plus n minus the sum of the. When alpha and beta are both one, as in the uniform distribution, we get the result that we had earlier. But this whole concept now of starting with the beta prior and getting a beta posterior is a really convenient one. This whole process where we choose a particular form of prior that works with a likelihood is called using a conjugate family. A family of distributions is referred to as conjugate if when you use a member of that family as a prior, you get another member of that family as your posterior. The beta distribution is conjugate for the Bernoulli distribution. It's also conjugate for the binomial distribution. The only difference in the binomial likelihood is that there is an n choose x combinatoric term. Since that does not depend on theta then we get the same posterior. We often use conjugate priors because they make life much more simpler. As we're working out posteriors, if we can't recognize this form, we get some intractable integral in the denominator. And trying to work out that integral can be problematic. It can get complicated really quickly. So sticking to conjugate families allows us to get closed form solutions. If the family is flexible enough, then you can find a member of that family that closely enough represents your beliefs. We can represent this model as a hierarchy. The observations Y1 through Ym follow Bernoulli likelihood. [SOUND] Theta is our primary parameter. It depends on alpha and beta, and we give theta a beta prior [SOUND]. Alpha and beta we might give particular values alpha knot and beta knot. We refer to alpha and beta as hyper parameters. Typically in what we will be doing here is we'll set this equal to particular values. But in a more complicated problem you might also want to have more flexibility by putting priors on L foot and or Beta. We can just extend this hierarchy to more levels. In complicated problems this may provide some added value and added flexibility. In simple problems this tends to be more work without providing much additional value. So these sorts of higher arch of models will be beyond the scope of this course.