In this video we turn to Bayesian inference in simple linear regression. We will use a reference prior distribution that provides a connection between the frequentist solution and Bayesian answers. This provides a baseline analysis for comparison with more informative prior distributions. To illustrate the ideas, we'll use an example to predict body fat. Obtaining accurate measurements of body fat is expensive and not something that can be done easily at home. Instead predictive models that can predict the percentage of body fat using readily available measurements such as abdominal circumference are easy to use and inexpensive. The figure on the left shows the percent body fat obtained from under water weighing and the abdominal circumference for 252 men. To predict body fat, we may start with one linear regression where the ordinary lease squares line has been added to the scatter plot. This has an estimated slope of 0.63 and an intercept of about -39%. For every additional centimeter, we expect body fat to increase by 0.63%. The negative interceptive course does not make sense as a physical model, but neither does predicting a male with a waist of zero centimeters. Nevertheless, this linear regression may be an accurate approximation for prediction purposes. With measurements that are in the observed range for this population. Our best estimate of the line uses the ordinarily squared estimates of alpha and beta to obtain the fitted values and predictions. The residuals which provide an estimate of the fitting error are the difference between the observed and predicted values and are used for diagnostics as well as estimating sigma squared. This is via the mean squared error, which is the sum of the squared errors divided by the degrees of freedom. Remember, the residual degrees of freedom are the sample size minus the number of regression coefficients in the model. Let's look at estimation from the base in perspective. We'll start with the same model as with OLS, but with the additional assumption that the errors are normally distributed with constant variants. We often add this assumption to obtain confidence intervals with ordinarily squares. So any of the diagnostic plots that are frequently used in frequentist regression can be used here to check this assumption. For conjugate analysis, the prior would be a bivariate normal distribution for the regression coefficients conditional on sigma. Sigma here provides scaling in term of units for the response. Marginally, alpha is normally distributed given sigma squared with the mean to notice a not in a variance control by this unitless scale parameter as sub alpha. Similarly, beta is normally distributed given sigma squared with a mean b0, and a variance control by parameter S sub beta. The covariance between alpha and beta, is sigma squared times a parameter that describes a priori our beliefs about how alpha and beta vary together. If s alpha beta is 0, then a priori or beliefs are that alpha will be independent of beta, conditional on sigma squared. To complete the specification, we use a conjugate prior sigma squared where 1 mover the variance has a gamma distribution with N not degrees of freedom, and sigma squared not is a prior estimate of sigma squared. This is similar to the priors that we've used in previous videos. Because of conjugacy, the posterior distributions will be normal gamma, with simple rules to update the parameters. It's useful to provide a Bayesian analysis as a starting point. The reference prior distribution is obtained as a limit of the normals as those variance parameters go to infinity and is flat or uniform in alpha and beta. While the limit of the conjugate gamma prior, with prior degrees of freedom going to 0, provides the reference prior for sigma squared that we've used in previous videos. The associated reference posterior for beta is a student t-distribution centered at the OLS estimate with the scale that's the same as the OLS standard error. The degrees of freedom are n minus 2 as in the frequentist analysis. Similarly, the marginal posterior distribution for alpha, is also a student t, with a center and scale given by the OLS estimates. Using the joint distribution of these parameters, we can obtain the distribution of expected body fat at any value of x, which is a student t distribution with n minus 2 degrees of freedom. The estimates are given by the OLS estimate, and the scale is the standard area of the fitted value. Using r and the lm function, we can obtain the parameter estimates, standard deviation, and 95% credible intervals. You should be able to confirm that the Bayesian estimates here in the table are the same as the frequentist estimates when we use this reference prior. The credible intervals for parameters, fitted values or predictions, are all obtained as the posterior mean plus or minus the appropriate t-quan tile times the standard deviation. The primary difference is in the interpretation of the intervals. For example, based on the data we now believe that there is a 95% chance that body fat will increase by 5.8% up to 6.9% with every additional 10 centimeter increase in the waist circumference. Of course, this model's an approximation, so be careful about any causal interpretation however, it can still be useful for prediction. For predicting body fat, we'll use the posterior predictive distribution. If we knew the parameters, a new observation would just be obtained by taking the mean, based on our population regression equation plus an associated uncertainty that describes how much the individual deviates from the population mean and x. Given the data at hand, our posterior predictive distribution is a t distribution with n-2 degrees of freedom. The best estimate for predicting the new value is the posterior mean of the population line. This is the same as the fitted values that we calculated before, but the scale parameter that's based on a predictive standard deviation. The predicted standard deviation incorporates posterior uncertainty about the regression line at x. From the last term, we can see that the variability will be the smallest when we are predicting near the mean of the data. With increasing variability for values of x that are at either end of the range of x. There's an additional estimate sigma squared as well that comes from the uncertainty about the error epsilon or how much we expect the observation to deviate from the regression line. The figure shows the data, the prediction equation are posterior mean which is depicted in orange and pointwise 95% intervals for estimating population mean shown as grey dashed lines. The 95% credible intervals for predicting body fat are the outer dotted lines. The majority of the data are within the prediction intervals as one expect. However, the point circled in orange has a body fat percentage that is much lower than expected by the model. In a later video, we'll explore Bayesian methods to determine if this is an outlier. To summarize, we've presented the posterior distribution for parameters, fitted values, and predictions in simple linear digression using a referenced prior distribution. Under the reference prior, the point estimates and intervals are exactly the same as their frequentist counterparts. Any software that provides OLS estimates can be used for by the summaries needed for the Bayesian reference analysis. The key difference is in the Bayesian interpretation of credible intervals. Finally, the reference and analysis provides a useful baseline for analyses with more informative prior distributions and to explore sensitivity to prior assumptions. The Bayesian approach provides a coherent updating scheme as well. These posterior distributions from our reference analysis may be used as informative prior distributions for future analyses.