Today, we're going to talk how to localize a robot given a projective transformation. We can see here pictures taking from a from a pattern that we have explicitly placed in our lab so that we can find easily the points of this pattern without having to solve the correspondence problem. This small you see on the pattern are called ducts. The ATAs that we have this ducts on the floor the way you see it on the right of the slide. And then we have one or two or more views. And for each view separately we can compute the projections information to the ground plane, and then from this projection information, we can say where is the camera and how the camera is oriented. Let's look again how we can describe coordinate systems. We have a world coordinate system, which will be on the ground. And it has Z axis which is pointing upwards and then X and Y on the plane, like the way we have learned it always on Geometry. And then we have a camera coordinate system with subscripts C like camera, where the Zc and then the Xc and Yc are parallel in this plane. When everything is on the ground plane, we can assume that the coordinates of these points have Z coordinate equals 0. What is the effect of this assumption? Let us look at the projective transformation, and specifically the transformation from a well-coordinated system to the camera coordinate system the way we have learned it in a camera model. We have homogeneous coordinates from world points X, Y, and W and we have the information which is irritation r1, r2, and r3 and translation T. Intrinsic parameters like the focal lengths, MRIs, and the matrix K and then we have the pixels in u, v, and w and we use this tilde to, [COUGH] and we use this tilda, so that we make sure that the left hand side is a multiple of the right hand side, and it's not exactly an equality. Now if the Zed coordinate in the world, equals zero, we see that we can eliminate R3. Why is this? Because when we have the product, R1, R2, R3, T, times X, Y, Zed and W, this is just an inner product. So if R3 is multiplied by 0, it disappears. And the result is, that instead of having a transformation, which is from a four dimensional vector, X, Y, Zed, W, we have from 30 measure to. And remember, this is in projected geometry so these are coordinate and in reality it's really a plane. From your plane x, y, w. For the So the only things that are remaining there. Are the two columns of a rotation matrix r1 and r2 and the translation vector t. Remember that the meaning of r1 and r2 are really the x and the y access on the ground plane, the way their express in the camera frame. Now the question is this is a three by three matrix. K is a three by three. r1, r2 and T are three column vectors so this makes three by three as well. If we summarize it and call is H this is the summation from P2 to P2. From one projective plane to the other. Any transformation that is invertible, can be a projective transformation. Let's see if this transformation is invertible. To see if a three by three matrix is invertible, we have to check its determinant. If we take the determinant of r1, r2 and T, we see that it is the mixed product of T and r1 x r2. Remember r1 and r2 are two vectors. And they're result, the cross product r1 and r2 is really the vertical to the ground plane, the perpendicular to the ground plane. So, if it's perpendicular to the ground plane, it's not a translation, then the H will be invertible for the simple reason because the K, which is the only remaining matrix is determinant F square, which is the focal length squared. So let us stay with this condition, that the t transposed r1 cosine 2 is not equal zero. What does this really mean? It means that the camera is not actually inside the ground plane. So the camera is not exactly at the street level. Which means that all the points will be projected just in one line. So this is the meaning of mixed product of translation and r1 and r2 being non zero. So given now that this transformation is projected transformation, and it is invertible, let us see how we can use this transformation in order to compute where is the camera. Supposed that we have four point correspondences, we have seen that we need at least four point correspondences to compute the projector's formation. Let us assume that we know the focal length and image center, so we know the matrix K. We can take the inverse of K inverse and H and then we built r1, r2 and T. Now nothing guarantees that if we do this inversion, the resulting matrix will have the properties of R, r1, r2, and T. Why is that? Because r1 and r2 are orthogonal vectors. So we cannot guarantee that if we take K inverse H, and we find some matrix, h1 primed, h2 primed, h3 primed, that these will be like an r1, r2 empty matrix. So we seek. Orthogonal r1 and r2 that are closest to the first two colors, h1 primed and h2 primed. The solution to the problem is given by the singular value decomposition. Which is described in a different lecture. And now just take the singular value decomposition as a black box. That you can, as a function that you could call in Matlab. So we can find mathematically, that the orthogonal matrix R, which is the closest to an arbitrary matrix, H1 prime, H2 prime, H1 cross H2 primed, is actually given the closest, orthogonal matrix, is given By the following result. That if we take the singular value, the composition, which is nothing else than a matrix U which is orthogonal. A diagonal matrix S, and the matrix V, that we can forget this S. We take only the U, and we multiply with a V transposed. Now, a small detail. The UNV transporter towards the matrixes we will multiply them, there will be an matrix but we don't know whether this matrix will have the determinate one. Why do we want the determinate one? In order to have a right handed coordinate system, when force [INAUDIBLE] has the [INAUDIBLE] one, we can put this matrix in the middle. 1, 1 indeterminate of UV transposed, and this makes the whole R having determinant one. Now to find the translation, the only thing we have to do Is to take the last vector, h2 prime and divide by it the magnitude of the first, or the average of the first and the second. In this way we have found the rotation and the translation from a projection formation. You can find the age with any form. Any function we have described before. Given the then we can find, where is the camera, like the quadrotor in this case, and how it is oriented?