And t_i can they can take two values,

one or two with some probabilities.

This way, the prior probability of t_i

being one for example, is just gamma.

So, it's the weight of the first component of

mixture and the probability of t_i being two is just one minus gamma,

and you can easily define the conditional probability.

Probability of x_i unit for example,

t_i equals two, is

just the probability according to the second component of the mixture, that's P2 of x_i.

Now, we have everything defined and we can

use the expectation maximization algorithm to find the best values of parameters alpha,

beta, and gamma we can find with this kind of algorithms.

Let's start with the E-step or expectation step.

On the E-step, we want to find the posterior distribution on the latent variable t_i.

So we want to find q of t_i which equals to the posterior distribution,

P of t_i, given P of t_i equals to c for example and we can put c here.

Given the data point x_i and the parameters but I will

omit them as z usually in this video.

Let's see how can we do it.

Let's start with probability of t_i equals to one,

if we know that x_i equals to one.

So we can find this expression by using the base rule.

Let's think equals the full ratio so it's

proportional to the joint likelihood of the joint distribution,

P of t_i and x_i which equals to the P of of x_i equals one.

Given t_i equals one times the parameter distribution

P of t_i equals one divided by the same thing,

batch of four t_i equals 1 and t_i equals two.

So, sum with respect to all the waitlist of a latent variable.

Probability of x_i equals one and given, t_i equals one,

so the same thing as in the numerator times the prior plus the same thing,

but four t_i equals two.

So, P of x_i equals one given,

t_i equals two times the prior,

P of t_i equals two.

And we can complete this thing for our model and so we

know our current values of parameters of alpha, beta, and gamma.

This conditional distribution is just P1 of x cycles one,

so it's alpha times the prior which is gamma,

the probability for t_i equals one divided by the same thing,

alpha times gamma plus probability that this particular point came from

the second component of the mixture which is zero because

the second component of the mixture never generates us ones,

times its proper beta1 minus gamma,

which is just one.

And this totally makes sense, right?

Since we know that the second component of mixture can never generate you number one,

it just means that if we see number one data set,

it will certainly generated by t_i equals one by the first component.

And we can compute the same thing,

we can compute the same thing for other values of t_i and x_i.

For example, t_i equals one given, x_i equals two.

We'll be using the same reasoning,

the conditional distribution which is one minus

alpha times the prior distribution which is gamma,

divided by the same thing,

one minus alpha times gamma plus

the same thing assuming that

this data point came from the second component of the mixture.

So, one minus beta times one minus gamma,

the prior distribution for the second component of mixture.

And this thing, if you substitute here the numbers from our initialization,

it will be just 0.5 times 0.5 divided by 0.5 times

0.5 plus the same thing

because of our particular values for the initialization and this is just 0.5 in total.

So, we've found our posterior distribution t_i given x_i,

then we can do the same thing for other values of t_i and x_i.

This way, we can compute the E-step of

the expectation-maximization algorithm for this particular model.

In the next video, we will discuss the M-step,

so how to update the values of parameters by using

these computed conditional distributions on the latent variable.