In other words, we're interested in whether the rate of nicotine dependence

differs according to which explanatory group the observations belong to, that is,

which smoking frequency group.

Notice that we are not interested in the column percentages for

those observations without nicotine dependence.

Indicated with a dummy code of 0.

Instead, we're interested in describing the presence of nicotine dependence within

the smoking frequency groups; that is, these column percentages circled in blue.

If I want to graph the percent of young adult smokers with nicotine dependence

within each smoking frequency category, I would first import the seaborn and

matplotlib.pyplot libraries and then add the following code.

First setting out explanatory variable to categorical and

a response variable to numeric.

And then requesting a bivariate bar chart.

With smoking frequency categories on the x-axis, and the mean for

nicotine dependence, which is the proportions of ones on the y-axis.

Now I can visualize the association, and see even more clearly that there seems

to be a positive linear relationship, that is the more days per month a young adult

smokes, the more likely they are to have nicotine dependence.

I know from looking at the significant P value,

that I will accept the alternate hypothesis.

That not all nicotine dependents rates are equal across smoking frequency categories.

If my explanatory variably had only two levels,

I could interpret the two corresponding column percentages and be able to say

which group had a significantly higher rate of nicotine dependents.

But my explanatory variable has six categories.

So I know that not all are equal.

But I don't know which are different and which are not.