0:06

All right, what we're going to be doing in this exercise is building a model for

retention.

And so you'll see a couple of different tabs in this file.

We've got a working tab that we're going to be using to build our model.

We have another tab for this geometric model that we're building that's complete,

that I've annotated already.

And then the other model that we're going to use or

that we're going to create is one that allows for a time trend.

So I'll explain that difference between these two models as

we get a little bit into things.

The final tab has a comparison between the performance of these different models.

So, one of the basic building blocks is the geometric distribution for

customer analytics.

The idea in the geometric distribution is, think of it like a coin flip,

0:54

each time period, there is a probability that the event happens.

So, in our case the even is churn or retention.

So in Month 1, there's a probability that you churn.

In Month 2, if you haven't churned yet, there's a probability that you churn.

In Month 3,

if you haven't churned yet, there's a probability that you're going to churn.

The assumption with the geometric distribution is that it's

always the same probability, so it makes for a very simple model but

it's also one that's a little bit restrictive.

So we're going to start by estimating this model using Excel,

estimating what is the retention or churn probability.

1:39

Comparing actual to expected data.

And then we'll see if that's not enough, do we have to do a little bit better?

What can we do as far as allowing that churn probability or

retention probability to vary from one time period to the next, so

making the model a little bit more realistic.

2:09

continue to have service in each time period out of this cohort.

What we've calculated in Column C and, let me zoom in so

that what we're doing is a little bit clearer.

What we've done in Column C already is taken the difference between the number of

individuals who had service in the previous period minus the number of

individuals who have service in the current period and that tells us how many

individuals cancelled service or how many individuals churned.

And so we're going to model the probability for

each of those individuals who have churned of doing so after one month,

after two months, after three months, after four months, and so forth.

So that's what we're going to do in column D is calculate out that probability.

We're then going to calculate the log likelihood

3:16

Well, we'll take into account the individuals who have not yet

cancelled service and calculate our sample log likelihood.

And we're going to use that to say let's find the value for P.

In this case, we'll set this up, again,

we can do this either as a retention or a churn probability, doesn't make

a difference, but let's say I want to set this up at as a churn probability.

What is the value that maximizes the observed data?

All right, so let's begin by specifying what's

the probability that an individual cancels service after one month.

All right, well, if we're going to specify in terms of a churn probability,

4:33

All right, well let's write out these probabilities first and

maybe we can see a pattern that emerges.

All right, so it's going to be one minus our churn

probability to say I survived one month and then churned.

5:23

so when it's, I survived one month,

that's one minus the churn probability times the churn probability.

So I survived one month and then I churned.

In the next cell it was, I survived two months.

5:43

So that's why we have it squared and then I churn.

Well, in the next one is going to be I survived three months and then I churn.

So all that's going to be changing in the next cell is going to be instead of

raising the one minus churn probability to the power of 2,

it's actually going to be raised to the power of 3.

And that's the pattern that we're going to see time and time again.

So let's actually, go back to sell D3, and rewrite this a little bit.

And the way that we can rewrite this in a more general fashion is,

there's the churn probability multiplied by

1 minus the churn probability, so that's going to be the retention probability.

6:43

So what does that look like if we drag?

So I survived t -1 periods.

So in this row, I survive zero and then I churn.

In row four, it's I survived, If we drag this formula down,

7:07

All right, so in row 4, it's I survive A4- 1.

So I survive one period, and then I churn.

In the next row, it's I survive A5 so 3- 1.

So I survive two periods and then I churn.

So the 1- K1 is my retention probability raised to the power for

how many periods am I retained, and then it's my churn probability.

All right, so that's the geometric model.

7:37

Now in terms of specifying the log likelihood, well,

we have 131 individuals who exhibited this behavior.

So, to construct the contribution to the log likelihood,

it's going to be the number of individuals who exhibited this behavior,

multiplied by the natural log of the probability.

8:01

And we're just going to copy that formula.

Now the reason we're using natural log is that using the raw probabilities

we're going to end up with a number so small because it would be 0.5 for

each of these 131 individuals, that'd actually 0.5 raised to 131st power.

That's an incredibly small number.

The more individuals we have, this joint probability becomes so

close to zero that the computer can't actually differentiate it.

So that's why we use a mathematical trick, and

instead of maximizing the likelihood we maximize the log likelihood.

Now, what we've done in cells E3 through E9,

we've taken into account all the people we observed to churn.

What we also need to take into account is the number of people

who have not yet churned.

All right, and so if we look at all the individuals who have

been observed to churn, that's 509, if we just look at that summation,

we still have 491 individuals who were not observed to have churned.

9:09

So these are individuals who survived seven time

periods without churning so let's take that likelihood into account now.

It's going to be our 491 individuals in B9 multiplied by,

now what's the probability associated with

not having churned by this particular time period.

Well a couple ways that we could do this we could say it's the retension

probability.

1- churn probability raised to the seventh power.

We could also write it as,

it's the probabilities that we haven't enumerated yet so

it's one minus all of the probabilities that you have churned.

10:34

So it doesn't matter which way you specify, we end up with the same result.

All right, and so what's the log likelihood of our sample?

Well, we're just going to take the sum of the log likelihood contributions from

each individual in column E.

10:50

All right so it's a big number that doesn't make a whole lot of sense to us.

But that's the number that we want to make as large as possible,

so we're going to maximize that value.

And to do that, let's see, we're going to go under,

looks like we have not added our analysis tool pack in yet, and

Solver, so let's make sure that we have those tools enabled.

So, Options> Add-ins, and we're going to manage our Excel add-ins,

and we're just going to make sure that we've put Solver on here, all right.

12:20

And we're done, right?

So our churn probability is about 10%,

which means we've got a retention probability of about 9%.

And so column D what it now gives us is we can calculate out,

out of the original 1,000 people.

Under this model we would've expected to see about

10% churn in that first month 9% churn the next

month 8% after that 7% after that and so forth.

13:28

The observed probabilities that we had, and so

the probabilities that we calculated for churn in a particular time period.

And let's take a look at how well [INAUDIBLE]

we underpredict churn in the first couple

of periods and then it looks like we're overpredicting the amount of churn.

So even in this calibration period, we're not doing too well, all right?

14:46

This the actual, and we'll call the other series Expected,

and this is, so under this particular model notice what we're doing is we're,

we under predict churn early on, over predict it later on.

There's no way that this model is going to work well from a forecasting standpoint,

because it's not even doing that well or

in the calibration period we're essentially splitting the difference.

We were trying to find kind of that middle ground so

we over predict in some periods, under predict in other periods.

That's because this is a very restrictive model, right.

15:26

Now, if you're looking for

a little bit more commentary on kind of each of the steps that we took,

you'll find it on the next page in the spreadsheet, where I've

annotated kind of each of the steps that we took along the way, all right.

So let's try to relax some of these assumptions.

And that is, rather than assuming there's a constant churn probability in each time

period, let's assume that there's a time trend, that maybe churn becomes.

Less likely overtime or more likely over time, but that it varies over time.

All right?

So we're going to use the logistic regression that we've looked at previously

as the foundation for that and focus on this time trend worksheet.

So same data that we had before It's set up a little bit differently.

So I'm going to walk you through how we're going to proceed here, right?

We're going to calculate out the survival probability first, and

then we're going to calculate the difference between the survival

probabilities in each of the adjacent time period.

So if I was

If I maintain service until time one, what's the probability of that happening?

What's the probability of maintaining service until time two?

If I take that different,

that's going to tell me how likely I am to drop service after the first time period.

So that's what we're going to calculate here, the like will then calculate

the log likelihood for each observe term period.

So and then arrogate that up to the sample log likely hood.

Notice in this case we have two parameters I'm calling alpha and beta.

We're going to treat alpha as an intercept.

We're going to treat beta as a slope.

We're going to essentially use logistic regression.

So alpha is going to be the intercept,

beta is going to be the slope for our time trend.

Alright?

And so let's start out by calculating the survival probability, all right?

Well the survival probability in order to make it to the next time period

you had to survive until the period before.

So we're always going to refer back to the previous period to say

17:48

by the probability of surviving in this current time period.

All right?

And so we're going to use, I'm going to write this a little bit differently from

how we traditionally do logistic regression, we're going to model,

we're going to use logistic regression to model tendency to churn.

And so typically we're going to have e to the x divided by 1 + e to the x.

In this case, if that's our term probability,

retention probability's going to be 1 over 1 + e to the x.

So what we're going to do is we're going to multiply the.

The probability of surviving into the previous period by one divided by

one plus the exponential function and it's going to be the alpha.

So, our intercept plus

our slope multiplied by a time trend.

19:05

So that's the reason that this is going down to point 5.

Now what's the next period if I drag the formula down it goes to point 25.

And let's take a look at the formula itself, it's D3 so

I survive at least until the previous period.

Through the end of the previous period and

it's multiplied by the probability that I survive the current period.

So that's the logic in this.

It's I survived through the end of the previous period multiplied by

the probability of surviving again.

19:39

All right, now let's use those probabilities to calculate

the probabilities that an individual churns in each particular time period.

It's the survival based on the previous period minus the survival

based on the current period and so we're going to use that logic of taking

the difference in the survival function, to calculate out what's the probability of

surviving until the present and churning in the present time period.

All right?

And so, we have those probabilities, this is the probability of

churning by the end of the seventh period.

Notice that this is just shy of one, because there's some chance that people do

not churn by the end of that seventh time period.

20:58

So we can calculate the likelihood contribution for

the individuals who we observed to churn.

And also, don't forget the people who we observed to maintain service,

throughout the entire observation period,

21:51

Okay, so we're going to maximize our objective, cell L3, the log likelihood.

We want it to be as big as possible by changing cells L1 and L2.

We have no constraints.

Alpha and beta could be positive, they could be negative, all right?

And so I'm going to check, we're going to make sure this box that says,

make unconstrained variables non-negative, make sure that's not checked.

I believe the default in some versions of Excel is for that box to be checked off,

and don't want that to be the case.

22:30

All right, so alpha -1.55 again

these numbers in a raw sense, don't really tell us all that much.

Beta being negative, what does that tell us?

Well let's look back to our survival probability equation.

So as time increases,

23:33

The exponential term gets closer and closer to 0,

so this fraction gets bigger and bigger, right?

And so churn Is decreasing over time is effectively what we're finding here.

Recall back we'd said earlier that

23:57

the way we set up this logistic regression expression was.

If we think of the churn event as happening

with the probability of e to the x over one plus e to the x,

the retention event happens with the probability of one over one plus e

to the x, so churn is decreasing in time.

So if we want to see how this performs overall, let's set up our calculation for

24:27

how many customers, we can do this in terms, in two ways.

The way that we've set this up is for the number of remaining customers, and

that's we can use our survival functions to do this.

We take our 1,000.

We've always got 1,000 customers.

And let's just multiply that by the expected survival probabilities

26:24

It looks pretty good.

I mean, compare this to what we saw with the geometric distribution and

we're doing a heck of a lot better.

So, what we've done is, we've allowed the churn probability to change over time.

In particular, what we find is that churn is decreasing over time.

All right, so it doesn't make sense for

us to assume that there's a constant churn rate.

26:46

You'll find on the next tab in the workbook, under the more heavily annotated

version of this spreadsheet, you get the same kind of results.

And what I've done on the last tab with just giving a comparison and

I included data that goes a little bit further out into the forecasting period.

All right, so going into the forecasting period, we see that again,

we're doing a much better job than if we used this full data

period than with the geometric model was capable of.

27:25

So we started with as simple a model as possible,

saying let's just assume it's a coin flip.

Each time period, there's a constant probability of churn

that was our geometric model.

And we saw okay, it's not great,

we saw that's essentially splitting the difference.

But when we look at the overall forecast,

we see that the geometric model just doesn't do that well for us.

Yes, it captures the overall shape, but

sometimes it systematically over-predicting or

under-predicting churns in other time periods goes the other way,

whereas allowing for the model with time varying churn rate,

we'd going to end up with a very much better forecast.

Now, one of the things we'll talk about a little bit later in the course, something

that's kind of pushing the limits of Excel is what's driving this dynamic.

Two potential explanations.

One is that individuals are literally changing over time becoming more and

more loyal, less and less likely to churn.

The other explanation is that the individuals who remain

later on are fundamentally different than the individuals who remain early on.

So, one explanation is dynamics in the churn probability.

The other explanation is heterogeneity across the user base.

Ideal model is going to combine both of those factors.

So we'll talk a little about unobserved heterogeneity,

the importance of heterogeneity later in the course, but again,

it's something that is really pushing the limit of what Excel can do,

better suited to a computing environment, such as R and MATLAB.

But in terms of, yeah, accuracy of the model, just bringing in a simple time

trend here, a linear time trend, does much better than ignoring that time trend.

And so we are able to pick up some of the dynamics that occur in that churn rate.