机器人如何实时确定他们的状态，并从带有噪声的传感器测量量获得周围环境的信息？在这个模块中，你将学习怎样让机器人把不确定性融入估计，并向动态和变化的世界进行学习。特殊专题包括用于定位和绘图的概率生成模型和贝叶斯滤波器。

Loading...

来自 University of Pennsylvania 的课程

机器人学：估计和学习

292 个评分

机器人如何实时确定他们的状态，并从带有噪声的传感器测量量获得周围环境的信息？在这个模块中，你将学习怎样让机器人把不确定性融入估计，并向动态和变化的世界进行学习。特殊专题包括用于定位和绘图的概率生成模型和贝叶斯滤波器。

从本节课中

Gaussian Model Learning

We will learn about the Gaussian distribution for parametric modeling in robotics. The Gaussian distribution is the most widely used continuous distribution and provides a useful way to estimate uncertainty and predict in the world. We will start by discussing the one-dimensional Gaussian distribution, and then move on to the multivariate Gaussian distribution. Finally, we will extend the concept to models that use Mixtures of Gaussians.

- Daniel LeeProfessor of Electrical and Systems Engineering

School of Engineering and Applied Science

In this lecture, you're going to learn how to compute and

estimate of the Gaussian mixture model parameters from observed data.

We will first try to follow what we did before

to get the maximum likelihood estimate for single Gaussians.

But as I will show, we are going to fail to obtain nice analytics solution.

Instead, I will introduce a new way to get a locally optimal solution,

using an iterative algorithm called Expectation Maximization or EM for short.

While a single Gaussian has only two parameters, mu and

sigma, GMM has multiple mu and sigma, plus weights, and

the number of Gaussian components, denoted here as K.

As I mentioned in the previous lecture, we will use uniform weights 1 over k and

focus on understanding how to estimate the mean and covariance matrix parameters.

Let's begin to find the maximum likelihood estimate

of GMM parameters, as we did for single Gaussians.

To briefly review, maximum likelihood estimation means we want to find

the parameters of the model that is most likely to produce the observed data.

As before, instead of maximizing the joint probability, we can assume

independence of all observations and maximize the product of each probability.

Next, we take the log and maximize the sum of the log of probabilities,

instead of the products of probabilities.

So far, we get the same results.

But when we apply the specific probability model of the GMM

into the equation, which is a sum of Gaussians, we have this.

It turns out that we cannot further simplify this formula analytically,

because there appears a summation of Gaussians inside the log function.

This implies we can estimate the parameters

only via iterative computations.

And any solution found might not be globally optimal but,

don't be too disappointed.

We are going to learn the EM Algorithm shortly, and

the algorithm gives a reasonable solution under certain conditions.

The way we are going to learn the EM Algorithm is first,

see how we compute the GMM parameters as a special case.

And then move to the general ideas of EM for interested advanced learners.

Having said that, let us first talk about EM for computing GMM parameters specifically.

We need two additional things, an initial guess of mu and sigma and

a new variable Z that we are going to call a latent variable.

It might sound strange that we need a guess to solve the parameters,

which we want to estimate.

However, in this class of complicated problems called non-convex optimization,

there exists many suboptimal solutions, which are called local minimum.

And the initial guess affects the solution found.

While the initial guess could be important to this problem,

it will not be the focus of this lecture.

So now let's turn to the new variable, Z.

The latent variable Z of the ith points with respect to the kth Gaussian

is defined as the relative ratio of the kth Gaussian density at that point.

Essentially, Z indicates the probability that

the ith point is generated from the kth Gaussian components.

Let's look at a 1D example with K equals 2.

We have this point Xi.

And for the moment, we have some prior of two Gaussians G1 and G2.

If the value of G1 of Xi is P1 and

the value of G2 of Xi is P2.

And the value of Zi1 is computed

as P1 over P1+P2, and

Zi2 is computed as P2 over P1+P2.

In this particular case, P1 is larger than P2.

So it is more probably that Xi is generated by G1 than G2.

Now given that the latent variable values for all data points and

all the Gaussian elements, we can redefine the mean and

covariance matrices weighted by Z.

If the probability that a point belongs to the kth

Gaussian is small, then the points contributes less for

computing the parameters of that particular kth Gaussian.

Putting everything we have discussed so far together, you can compute

the parameters and the latent variables iteratively until the values converge.

First, we initialize the parameters.

Then use the initial value of mu and sigma to compute Z, the latent variables.

Once Z is updated, now we fix them and update mu and sigma.

You will alternate between these two steps

until the change of parameters becomes very small.

We are not going to prove it here, but

it can be shown that this iteration converges to a local optimum.

I have introduced a use of the EM Algorithm for

parameter estimation of GMMs, which work through iterations.

We have seen that with an initial setting of the mean and

covariance and by introducing a latent variable,

we've surprisingly turned the problem tractable in a locally optimal sense.

In the next lecture, I will introduce the EM Algorithm more generally.