One of the greatest achievements of digital signal processing is the ability to efficiently compress information prior to transmission. And this is why we can enjoy so much digital content on a variety of fixed and mobile devices. Here we will see how to cast a compression problem in terms of vector space operation. We will start by introducing the concept of vector subspace, talk about least squares approximations, and then show you how to approximate a function using Legendre polynomials rather than the Taylor series expansion. The concept of vector subspace is very simple. It is a subset of vectors that is closed under addition and scalar multiplication. Classic example is a plane in three dimensional space. A plane is a sub space because if you combine together vectors lying on the plane, you will remain in the plane. You will never be able to leave the plane to the third dimension. This is actually the theme of a very famous book called Flatland, published at the end of the 19th century, which I encourage you to check out if you don't know it already. We can extend the concept of subspace to more complicated vector spaces such as function vector spaces. Take, for instance, the set of symmetric function over the interval. This is a subspace in the sense that, whenever we take a linear combination of symmetric functions over these interval, we will always end up with the symmetric function. Here, for example, you have cosine of 5, t. We take a second function cosine of 5 pi of t, both are symmetric around 0. And if we sum them together, the result, of course, will be symmetric as well. Subspaces have their own bases, so, for instance, consider the plane in three-dimensional space. So the 3 bases vectors in the canonical bases are e0, e1 and e2 and if we consider just this plane here, our bases for the plane will be e0 and e1. And here we get to the problem of approximation. Suppose you have a full fledged vector in your vector space, in this case a vector in 3 dimensional space, you have a subspace S let's take for instance the plane here. And the question is, how do I approximate x using only elements from this subspace? Intuitively, the answer is, I take the projection of this vector over the subspace, and this will be my approximation. The power of vector space paradigm for signal processing is that we can extend this geometric intuition to arbitrarily complex vector space. So now we will define the concept of orthogonal projection for approximation in abstract terms. Consider an orthonormal basis for the subspace S, called this S of K. The orthogonal projection of the vector X onto this subspace is defined like so. This is like a partial basis expansion, where we take the coefficients as inner products between the original vector and the basis for the subspace. The orthogonal projection, defined like so, is the best approximation for a given vector x onto the given subspace S. It is the best because it has two fundamental properties. First of all, it has minimum-norm error. You remember that we defined the distance between two vectors as the norm of the difference between the vectors. So what we're saying here is that the orthogonal projection of the subspace is the vector that minimizes the distance between the approximation and the original vector. The second fundamental property is that the error is orthogonal to the approximation. What that means is that the error and the set of basis vectors for the subspace are maximal at different, in other words, they're uncorrelated. The basis vectors for the subspace can not capture anymore information contained in the error. We can try and illustrate this with a minimum dimensionality example like so, where in the plain our sub space is one dimensional sub space s. Assume this vector here is unit norm, so it is its own orthonormal basis and we want to approximate the vector x line on the plane. So the way we can find the minimum distance approximation is to consider curves of equal distance emanating from X until we hit the approximation sub space. And so if we do that, geometrically, we build circles like so, and when we hit S, we can see that indeed we are at a minimum distance from X. And that the error, this red vector here, is indeed orthogonal to the subspace that we are approximating from. Okay, now let's try to make things a little bit more interesting by considering function vector spaces. And in particular, we will try to approximate elements of L2(-1,1), the space of square interval functions over the interval minus 1, 1. With elements of a very particular sub space called PN of -1,1, which is a space of polynomials of degree N-1 over the interval. A generic element of the subspace will have the following form, just a polynomial of degree N-1. And you can see that, like all polynomials, this is a linear combination of increasing powers of the variable t. So a self-evident, naive basis, if you want, for this base is the set of all powers of t up to degree N-1. However the naive basis is not orthonormal. It's clear that we can express any polynomial as a linear combination of vectors sk. But, these vectors are not orthogonal, norm unit norm. To make things more concrete, let's choose a candidate vector for approximation. And we will choose x = sine of t over the interval -1,1. And we will try to approximate it with a second degree polynomial, so with an element of the subspace P3[-1, 1]. Clearly, second degree polynomials are square integral functions over the interval, so this is a legitimate subspace. The way we're going to perform this approximation is first try to build an orthonormal basis for P3 of minus 1, 1. Because once we have an orthonormal basis, the best approximation will be simply the orthonormal projection of sine of t over this basis. And to project x over this basis, all that we need to do is to take the inner product between sin of t and the elements of the basis. We will then compute the approximation error and compare that to what is probably the most common approximation of the sin function over an interval, which is Taylor approximation. The Taylor approximation is well known and easy to compute, but we will show that it is not optimal in the mean square error sense. Okay so the first step is to build an orthonormal basis for pn of -1, 1. We will use the Gram-Schmidt orthonormalization procedure starting with bases and arrive at an orthonormal set. The Gram-Schmidt algorithm is an iterative procedure that takes one vector at a time from the original set, and incrementally produces an orthonormal set. Each step has two sub-steps, the first is to take the next vector in line, say s(k), and remove from it all components that are co-linear, namely, in the same direction as all the other vectors that you have produced so far for the orthonormal set. And then once you have away all the components that are dependent to the previews vectors, you normalize the vector and obtain a new member of the orthonormal set. Let's look at it graphically for a very simple example. Suppose you have two vectors are on the plane, which do form a basis because they're pointing in different directions. But they do not form an orthonormal basis. So the first step takes the first vector. Since it's the first vector, you don't have any other vector that you have computed so far. All you need to perform is the second substep of the procedure, just normalize the length to 1. And so, the intermediate vector is identical to the original vector, and the final vector you put in the bucket of orthonormal vector will be the rescaled version of s0, normalized to unit north. Then you take the second vector and you can see that s1 and u0 are not orthogonal. So, they have some directionals in common. It is to remove from s1 all components that could be captured by u (0). So in order to do so, you project s(1) over u(0), and this is the oratorical projection of s(1) over u(0), ok, this orange vector, and then you remove it from s1. So you take away the part that could have been captured by other the vector. And if you subtract this orthogonal projection from s1, you obtain p1, which is now, of course, orthogonal to u0. The only thing that's left to do is to normalize p1 to unit norm. And here you have converted s1 and s0, to two orthonormal vectors. If we now apply the orthonormalization procedure to the naive basis for p3 of minus 1, 1, the only things that we have to adapt in our computations, well first of all remember the basis is 1, t, t squared, t cubed and so on. And the inner product between two vectors in L2 of -1, 1 is defined as the integral from -1 to 1 of x (t) y(t) in dt, right? So with this in mind, let's start with the orthonormalization procedure. The first vector in the bucket of non-orthonormal vector, will be this guy, 1. So s0 is equal to 1, the intermediate vector, because there are no vectors computed so far in the bucket of, orthonormal vectors will be equal to s0. The norm is the integral of 1 between minus 1 and 1, equal to 2 and therefore the first orthonormal vector in our basis will be the original vector normalized by the norm, which is square root of 2. So u0 will be square root of 1 over 2. Second vector in the bucket of non-normalized vector will be t, so of degree 1. We take the inner product between this guy and the vector that we have computed before. Remember, that is square root of one-half. So the inner product within these two vectors simply the integral between minus 1, 1 of T divided by square root of 2. Well, T is an antisymmetric function over the intervals, so the interval between minus 1, 1 will be equal to 0. What that means is that u0 and s1 are already orthogonal. P1, therefore, would be equal to s1. We now compute the norm. It's easy to show that it's equal to two-thirds. And therefore, the second orthonormal vector in our bucket will be equal to square root of 3 over 2 times t. Third vector that we want to normalize is t square. We take now the inner product between u0 and t squared. This time the inner product is not 0, it's actually 2 over 3 square root of 2. So there is a directionality in common. We take the inner product between t square and u1 that we just computed in the previous step, again an antisymmetric function so the integral over symmetric interval will be 0. Finally, p2, this intermediate vector will be equal to s2 minus the components that has in common with u0, so in this case, minus one-third. We normalize this by computing the norm, and we find that u(2) is equal to square root of 5 over 8 times 3t squared minus 1. So as you can see the procedure is a little bit laborious and not particularly interesting. But what we do get in the end is a set of orthonormal polynomials for the interval -1, 1. These polynomials are known as the Legendre polynomials. And of course you can compute the Legendre polynomials of arbitrary degree. Here we're going to show the first five, so degree zero is the first bases vector. Degree one and degree two, this is what we're going to use in the approximation example that follows. But if you're interested, this is the third degree Legendre polynomial, the fourth degree and the fifth degree and so on and so forth. So back to our approximation problem. We're trying to approximate sine of t over the subspace, p3 of -1, 1. We know that the best approximation is going to be given by the orthogonal projection of sin of t over the set of Legendre Polynomials up to degree 2. So we compute these expansion coefficients as the inner product between the Legendre polynomials and the function that we want to approximate. So let's begin, alpha 0 is the product within the first Lengendre polynomial sine of T. The first Legendre polynomial is just a constant, so we're taking the integral between -1,1 sin of t times a constant, which is equal to 0 because sin of t is antisymmetric. The first coefficient, alpha 1, is now the integral of square root of 3 over 2 times t, times sin of t, between minus 1 and 1 and it's equal to approximately 0.7377. The last coefficient is the inner product between the second degree Legendre polynomial inside of t. Now, we could compute this but it's enough to see that we're computing the integral of the product between a anti-symmetric function and a symmetric function. Remember it's the second degree, Legendre polynomial is a parabola like this. So the product will be an anti-symmetric function and the integral over the symmetric interval will be equal to 0. So finally, the orthogonal projection of sin(t) over P3 [-1, 1] consists of only one term. Alpha one times u(1) and if you go back to the definition of y1, you can see that it's equal to 0.9035t. This is similar, but not quite identical to the standard approximation for the sin around 0, given by Taylor's series, which is sin of t approximately equal to t. So in both cases, we're approximating the sin with a straight line, but the slope is a bit different. Let's see how this affects the approximation result. If we plot the sin, the Taylor approximation and the orthogonal projection approximation, we don't really see a lot of difference. But we can concentrate on the error between the original function and the approximates. So, here you can see what happens. In red, we have the error of the tailors here as approximation. And we can see the error is very small in intervals around 0, but then tends to grow large as we move away from 0. This is because the Taylor approximation is a local approximation. The orthogonal projection on the other hand, minimizes the global distance, the mean square error between the approximation and the original vector. And so, although the error is not as small around the origin, the energy of the error is lower, considered over the entire interval. It's a global optimization of the approximation. We can compute the distance between the sine function and the approximate. In the case of the Legendre approximation for the sine, we have that the mean square error of the approximation norm of the error is 0.3337. Whereas in the case of Taylor series, we have that it's 0.0857. So we have a factor of approximately two and a half between the two mean square errors.