0:47

And the black points which is the feature points we used.

Â And then we superimposed the trajectory on a map of Philadelphia.

Â The only input we used was this panoramic camera,

Â the panoramic video you see on the top left.

Â 1:22

In biological perception, we talk about path integration.

Â This is what animals and humans do when they don't have reference points,

Â when they don't place cognition.

Â They just integrate their path, and they know approximately how far they went.

Â 1:46

What is visual odometry?

Â Visual odometry is really the process of incrementally

Â estimating your position to orientation with respect to an initial reference

Â by tracking only visual features.

Â It sounds very similar to the bundle adjustment.

Â The difference is that the bundle adjustment can have very large baselines,

Â it can be from different cameras, it can be random images in the web,

Â while the visual odometry is usually from a camera which you either hold.

Â Or it is mounted on a robot and because it is taken as a video,

Â we can really exploit this trajectory.

Â 2:35

We also use the term visual slam.

Â And many people use it interchangeably but when we say visual slams,

Â we also put the focus not on your projectory but also on the feature map.

Â Like the map of the visual features when they triangulated in the world.

Â 2:58

Is like widen your fields with many advances in the last 15 years,

Â it's not getting textbooks but there is a very good reference tutorial by

Â David Raskino Musa, and very recently on December 2015,

Â there is the IC big workshop on the future of realtime slum and they really held you

Â to visit the website of this workshop and look at all the slides.

Â 3:25

The most successful application of visual odometry is probably on the planet Mars.

Â NASA has sent already three vehicles.

Â Even from the very first one the speed and

Â the protunity will head to solve the following problem.

Â Even there was some remote control from the earth, you know that move

Â the vehicles, the delay of sending a command to the Mars might take up to

Â 20 minutes, so there is no way to really drive this vehicle with a joy stick.

Â 3:55

Now, how can a vehicle navigate on the Mars?

Â There is no GPS there, so

Â the only thing we can do is really apply visual optometry.

Â So we might send some wave points where the robot has to go.

Â But between these two wave points the robot has to solve the visual

Â optometry problem.

Â Another big success with visual optometry is this vacuum cleaner called

Â Dyson 360 which uses implementation of Andrew Davison visual slam,

Â it uses an on directional system at 360 degrees eye system which

Â captures this panoramic picture.

Â And then using features natural features in the environment.

Â It can find its position and then transverse a regular pattern while

Â knowing at every point where it is with respect to the first frame.

Â 4:50

Now let's go back again to our equations, to the multiple views setting.

Â We had calibrated projection points xp and yp for

Â frame f and we have it is a non deposes, rt and the 3D points, xyz.

Â In the visual odometry given

Â an estimate rk tk of the current camera pose as well as the 3D points.

Â And having also the correspondence to calibrated point projections

Â we really need to update every time.

Â So we have a point at kdk, we have a time point dk

Â dispose and one to updated to the next time point.

Â 5:35

And when we say visual odometry by default we refer to monocular visual

Â odometry just using one camera and this means that when we don't use any

Â other censor we're still having unknown global scale.

Â There's is done in two steps.

Â One for the rotation and one for the translation.

Â 5:56

First when an incoming email so time to k plus one is in and

Â we find features, we try to find the right correspondences.

Â These correspondences have many outlier so we need to apply a RANSAC in order

Â to select the inliers and usually we do i t with what we called a minimal problem.

Â In this case by choosing five points and

Â a sampling over five points and then applying the five point algorithm.

Â 6:27

After we find the inliers, we solve for

Â the polar geometry, which means we find the center of matrix A.

Â And then we can obtain a rotation estimate.

Â And the rotation estimate is really sufficient in order to update for

Â the rotation.

Â 6:43

And we find also a translation as to be made but

Â is really not enough though because we don't know it's scale.

Â So we cannot really apply this last equation.

Â For the translation what we really need is an estimate of the 3D points.

Â So we need first a triangulation of the 3D points and then we

Â can update the translation by just using a PNP algorithm, a 2D to 3D algorithm.

Â In this point of a 2D to 3D we have the option

Â why they update together the rotation translation or only translation.

Â 7:19

So this is the main cycle of visual odometry,

Â we always have essential matrices between the points.

Â We compute relative translation rotation with the two successive images and

Â then we need to integrate because we use this pairs of subsequent frames,

Â depending on the base line, depending on how many features we can track.

Â This might become very vulnerable to what we call the drift.

Â The main problem of visual odometry and to really address this drift, what we do.

Â As we group a window of frames like the last end frames and

Â we apply a bundle adjustment.

Â 8:15

And this will create excellent local map.

Â Using the same back projection error that we have used in battle adjustment.

Â Using pretty much these two equations in a non-linear linear square setup.

Â Now when we apply it as global filter over the whole sequence we have.

Â 8:49

And we use the exponential of the symmetrical of

Â angular velocity update the rotation.

Â And then we have an update for the translation using a velocity,

Â we assume that both velocities, the angular and the translation, are constant.

Â 9:06

For any filter approach, like the Kalman filter,

Â we need to also update the covariances, which are really estimates of the error.

Â We really need a good propagation of the error.

Â 9:17

First to make sure that we have some idea about how uncertain we are.

Â You might have seen these big circles around the GPS position

Â when your GPS measurement is unknown.

Â Is the same what we're going to do with the visual geometry but

Â also first we really want to know how uncertain is our structure as we will see

Â in the next slide.

Â So if big sigma is our covariance and sigma k,

Â k minus one is the covariance between frame k minus frame k our updates

Â that then usually by pre and post-multiplying the Jacobian,

Â where Jacobian has exactly the same meaning as the parallel adjustment.

Â We can update the covariance and

Â we can really visualize within an ellipsoid for the 3D points, and

Â ellipsoid for the position and some other presentation for the rotation.

Â 10:54

We have a quite large uncertainty depth.

Â But when we move forward, if we are lucky, to really

Â still track the same point, we can have with this very large baseline,

Â we can have a very small uncertainty ellipsoid.

Â And in this case, this really the point to update our translation estimate.

Â The frames when these are called keyframes.

Â And they are very important in the visual odometry implementations.

Â 11:33

Outliers appear because of illumination changes,

Â because of occlusions, because we might are moving very very fast.

Â And you see here an example of the trajectory that we constructed

Â if we don't do any inlier selection, which is the blue, and

Â after we really select good inliers, which is the red trajectory.

Â 11:56

They're inliers because also a drift because they really cause biases and

Â in the rotation translation estimation and for

Â the first time in 2004 David Nested who invented the 5-point

Â algorithm provided us also with a solution to solve for

Â the inlier problem in the two of your case.

Â Now, choosing a quintuples,

Â these groups of five points appearing ransack might be very expensive,

Â so we need some alternative and the alternative different game with

Â an invented here at the at the University of Minnesota.

Â Which is that if you know a direction, like the gravity from the IMU, or

Â just the point at infinity, then you already know two degrees of freedom

Â of the rotation, and the remaining problem has three degrees of freedom,

Â the yaw angle and two for the translation direction.

Â So what you do is every time before you solve for answer you're lying with

Â this direction, for example the gravity and then you solve for

Â constraint problem which has in the rotation part only one angle is unknown.

Â The way you see it here and then you have the asymmetric matrix to translation

Â which is only two unknown section y.

Â And because you know only direction of this xy you just set it

Â as cosine filtered sign theta, and you have to solve a system

Â with the four equations and forum knowns for the three quotes.

Â This can be solved much faster than the five point algorithm and

Â we can obtain a much better run section solutions.

Â Without spending most of our time in the inlier selection.

Â 14:02

And you see with your eyes that they are the same point.

Â Then you really have to enforce in your system that this image has been seen and

Â actually that any error that you have like your estimated pose in this picture

Â has to be corrected and come to the same position where you started, for example.

Â 14:22

This is an essential element of every visual odometry algorithm.

Â And it has two steps.

Â The first step is that you look in the vicinity of every

Â pose you are whether there is somewhere around there in.

Â Which means whether you're visiting the same place.

Â And we do it with feature level.

Â For example the vocabulary trace.

Â And then, we just apply geometric consistency.

Â And might be also a bundle adjustment in order to correct all our poses so

Â that we are again at the correct pose and

Â we don't create a phantom by hallucinated that we are in a different position.

Â 15:08

So these are the basic, actually,

Â ingredients of every digital odometry algorithms to repeat them here.

Â That we would bundle adjustment over a window to minimize the drift.

Â That we do a keyframe selection to really minimize the triangulation error.

Â That we apply RANSAC for five points or

Â three points in order to select the inliers.

Â And last, that if we are revisiting places,

Â we really have to adjust our position with what we call visual loop closing.

Â 15:39

Now newer systems use additional information.

Â One of the systems here from the University of Minnesota

Â uses a combination of inertial and visual elements.

Â And you can see on the left, the incoming emails and on the right, the trajectory.

Â This is using just the regular cell phone and

Â the inertial measurement unit inside the cell phone.

Â 16:11

And inertial measurement units measures the acceleration and the angular velocity.

Â And the acceleration which is really meters per second square when combined

Â in the integration with the velocity allows us to estimate our post in terms

Â of meters, not in terms of an unknown global scale, as we have seen before.

Â 17:01

And in this case, you can see that we have the feature trucks on the right.

Â And they will project their reconstructed trajectory on the left, including,

Â in blue, all the features which were reconstructed up to this point.

Â 17:17

Another recent development, aimed at visual odometry,

Â is the semi direct approach, where in addition to features we use

Â directly the whole image when we have the projection error for the motion.

Â This is quite impressive video for a quadrotor and

Â we see on the right that a construction of the projector of the vehicle and

Â also the points that are constructed from the ground and

Â the points the way they're seen and tracked in the picture in the bottom left.

Â Probably the most recent and

Â successful application is the realization of visual inertial odometry

Â on the Project Tango,which started as a small cell phone from Google.

Â And now it is a tablet.

Â And captured in within on the directional image and

Â initial information of the trajectory of this tablet.

Â You see on the left the initial measurements and

Â on the right you see that a constructed trajectory the features are not that

Â many, they are here like 70 or 80 or 100 and

Â we can produce a quite accurate trajectory off of the tablet.

Â 18:39

Now what is the future of Visual Slam?

Â In the future, Visual Slam, in addition to features and this information,

Â we can really include some semantic information.

Â For example, we recognize the doors.

Â And we have some model about door recognition.

Â In this case here, taken inside the computer science building,

Â we see the construction of the trajectory of the camera using a features,

Â but as well as a doors which you see the recognition and

Â these bounding boxes as well as actually any chairs in the environment.

Â 19:17

Symantec information does not only help on for

Â using a symantec mapping where are the doors and where are the chairs.

Â But also allows us to solve very efficiently the visual look important.

Â Visual adometry is application that we're going to use everywhere

Â inertial navigation wherever there is no GPS and probably also in many virtual

Â reality set ups where we're in a the head of a user.

Â