机器人如何实时确定他们的状态，并从带有噪声的传感器测量量获得周围环境的信息？在这个模块中，你将学习怎样让机器人把不确定性融入估计，并向动态和变化的世界进行学习。特殊专题包括用于定位和绘图的概率生成模型和贝叶斯滤波器。

Loading...

来自 University of Pennsylvania 的课程

机器人学：估计和学习

281 个评分

机器人如何实时确定他们的状态，并从带有噪声的传感器测量量获得周围环境的信息？在这个模块中，你将学习怎样让机器人把不确定性融入估计，并向动态和变化的世界进行学习。特殊专题包括用于定位和绘图的概率生成模型和贝叶斯滤波器。

从本节课中

Gaussian Model Learning

We will learn about the Gaussian distribution for parametric modeling in robotics. The Gaussian distribution is the most widely used continuous distribution and provides a useful way to estimate uncertainty and predict in the world. We will start by discussing the one-dimensional Gaussian distribution, and then move on to the multivariate Gaussian distribution. Finally, we will extend the concept to models that use Mixtures of Gaussians.

- Daniel LeeProfessor of Electrical and Systems Engineering

School of Engineering and Applied Science

In this lecture, we are going to talk about the EM Algorithm.

As compared to the previous lecture, the target audience here will be advanced

learners with some background in probability and convex optimization.

Let us consider the EM Algorithm as a maximization of a lower bound

of an object function.

I will review some background and

then describe the iterative steps of the algorithm.

Let's first learn an important inequality that is fundamental to the algorithm.

The main idea of the EM Algorithm can be explained using Jensen's inequality.

Consider a convex function, such as this 2D graph.

Let's take two points, x1 and x2, in the domain.

A middle point will be the mean of the two points.

Now, we are going to compare two values.

First, the function value of the mean of the two points

which relies on the function curve.

Second, a mean of the two function values of the points

which lies on the straight line that connects Fx1 and Fx2.

You may notice that the straight line lies above the function curve.

The mean of Fx1 and Fx2 is always larger than the function value of the mean.

We can generalize this idea.

Inequality holds for any weighted mean of the two points.

In fact, this applies to any positively weighted cases or multiple points,

even in higher dimensions, as long as the function is convex.

If the function is concave, we only need to flip the inequality.

The type of function we are going to handle is a log function,

which is always concave.

Thus, we can use Jensen's inequality as a lower bound.

Now, we'll introduce a latent variable which we can never truly know.

Interpretation of latent variable becomes clear in specific applications

such as the estimation of GMM parameters from the previous lecture.

What we are going to do with this latent variable

is take the marginal probability over the variable.

As before, we consider the log of the probability.

Using Jensen's inequality,

we can provide a lower bound of the log-likelihood as shown.

As we have seen, we cannot directly maximize a log of a sum of probabilities.

So instead, we will use a lower bound of the function which we can compute.

Finding a lower bound implies finding a distribution of z, the latent variable.

The lower bound turns out to be a function of parameter given some old

parameter value.

Now, let's see how this applies to.

We have an initial guess for the parameter, theta nought,

and we can obtain a lower bound g by finding some q using that guess.

As I mentioned, a low likelihood can be seen as a function of the parameters.

Suppose f and g are visualized as a function of theta on this graph.

G does not move exactly like F, but

they are locally similar in the neighborhood of theta nought.

Remember we want to find a maximum likelihood estimate of the parameter

theta.

Given the current lower bound G,

we can find a better parameter denoted as theta star.

This is still not a maximum of F but is improved from the previous estimate.

Since we now have a better guess of the parameter,

we can compute an improved lower bound to G, which is shown in the plot.

Again, the new G is not quite the same as F.

But it locally agrees with F around theta one.

But we can compute an even better theta based on the better G.

If we repeat this process, the parameter eventually converges to a local maximum.

To summarize, in the E-step, we compute

a lower bound of the log-likelihood using a given set of parameters.

Again, this implies that we obtain the distribution of the latent

variable based on the given parameter theta hat.

The M step, we obtain the maximizing parameters of the current lower bound.

This process is repeated until the updates become less than our threshold.

We have learned the EM Algorithm for

finding maximum likelihood solutions for complex problems.

The estimation of GMM parameters from the previous lecture

was an example of applications of this algorithm.

You will see another application of EM we talk about localization in week four.