Like linear regression models we can

include multiple variables in a logistic regression model.

The resulting model is called a multiple logistic regression model.

To illustrate multiple logistic regression we look at

a model with one more variable, patient gender.

Here are summary statistics for female and male patients.

For female patients there are 3,947 arrivals and a 1,229 cancellations.

Where those for male patients or 1,854 and 433 respectively.

The cancellation rates for female and male patients are

23.74 percent and 18.93 percent respectively.

Therefore, the cancellation rates for

female and male patients are substantially different.

This suggests that including

the gender variable in the model can potentially increase the model fit.

We included one more binary variable we estimated

one more coefficient for that additional variable.

Here beta two is the coefficient for gender.

Know that gender is a categorical variable,

taking the value of one for male,

and zero for female.

The estimated coefficient for gender is minus 0.3572.

For male patients the intercept is decreased by this amount leading to smaller log odds.

This also implies that the predicted probability for male patients is smaller for

the same lag value which is consistent with

our observation made earlier in the summary statistics.

What is the predicted cancellation probability for

an appointment with 10 day lag for a male patient?

Recall that the value of the gender variable is one for male patients.

Plugging in 10 for lag and one for gender,

we obtain the predicted probability of 13.72 percent.

This estimated probability is smaller than

the one calculated when we only include the lag variable.

Even though the coefficient estimate for

the gender variables intuitively opinion we still need to

exert caution when interpreting the estimated coefficient due to multicollinearity.

As in linear regression,

if one predictor is highly correlated with another predictor,

coefficient estimates may not be reliable.

In the model we consider here the two variables might be correlated.

For example, if female patients are more likely to book

appointments early then the average lag for female patients will be higher.

We need to bear in mind that possibility when interpreting the modelling result,

we can remove redundancies by dropping predictors via variable selection.

It is also possible to perform data reduction before model estimation.

In the previous module we discussed several ways to improve fit,

for linear regression models including interaction terms,

data transformation, and the model selection.

It is straightforward to apply all these ideas to logistic regression models.

Therefore, I choose not to discuss them in detail here.

However, I would like to emphasize that it is

important to find ways to improve your model.

A reasonably good model is often the result of

an iterative process that considers many alternatives.