The most celebrated variable length code was derived in 1951 by David Huffman, and it carries his name. It provides a straightforward methodology based on the Morse principle for designing a prefix and therefore, a uniquely decodable code. In this segment, we show the methodology for designing first order, as well as block Huffman codes. In the latter case, groups of symbols are combined, to form a new symbol, generated by the so called extended source, with an increased alphabet size. This extension of the source can provide dramatic improvements in coding efficiency in some cases. While in some other cases, such as sources with few symbols and skewed probabilities of symbols, the improvement is not immediate. That is, we need block codes of large sizes. These are some of the shortcomings of Huffman encoding which are addressed by arithmetic coding to be discussed later. So let us now look into the fine details of this important material. The most celebrated and widely used prefix code is due to David Huffman. It carries his name and was derived in 1951. It's quite straightforward to explain, understand, and utilize the code as we'll see right away. So the guiding principles of the code is that symbols that occur more often will have shorter code words. If you recall, this is exactly what we called the Morse Principle earlier in the class. And the second guiding rule is the two symbols that occur least frequently will have the same length. And this is something that will jump out right away as soon as we explain the procedure of the code. So the procedural steps are as follows. We sort the symbols according to their probability, then we take the least probable symbols. We combine them to form a composite symbol, which is going to have probability of the sum of the respective symbol probabilities. So with this step, we'll reduce the size of the alphabet by one. So, we repeat these two steps for the remaining symbols. Clearly if the size of alphabet is n, we're going to have n minus 1 steps or stages of the procedure. Well, each stage involves these two steps. And finally, after we combine the last two symbols, we're going to see the binary structure of the tree. And assigning zeros and ones to the left and right branch as we did earlier, the code is going to jump out. So let's look at a couple of examples where we'll apply this procedure and see that indeed it's quite straightforward to produce a Huffman code. So here's a symbol source. If that meets five symbols with the associated probabilities. So we want to build a Huffman code to represent the symbols of the source. According to the procedure we just described, I'll have to look at the probabilities of the symbols. And consider the symbols with the smallest probability. So the symbols with the smallest probability are s4 and s5. I'm going to combine the two to form a new symbol here that is going to be the, let's s4, s5 symbol. And this going to have probability 0.2, the sum of the probabilities. So after stage one, I'm left now with four symbols: s1, s2, s3 and s4, s5. So I look at the probabilities now and I want to combine the two symbols with the smallest probabilities. There are three symbols with probability 0.2 so I have to make a choice. So let's assume I choose to combine this symbol, with this symbol here. So the symbol s3, s4, s5 is formed, and the probability of this symbol is 0.4. So I'm left with three symbols, s1, s2, and this composite one. I combine the two with the smallest probabilities are clearly the one with probability 0.2 and one of the two that has probability 0.4. So let's assume I combine these two, so a new symbol here s1, s2 is formed. And the probability of the new symbol is 0.6. So, I'm just left with two symbols, I combine them. And the overall composite symbol has probability one. Okay, so we see that following this process, I have here a binary tree. So this binary tree represents the code we've obtained. This is the root of the tree here. And arbitrarily, let's assign one to the left branch of the tree and zero to the right branch of the tree. So, there's going to be and one here and a zero here, and one here and a zero here, and one here and a zero here. So, looking for the code words of symbol s1, I start at the root and try to reach s1. So, I go through n1 and the second one. So, the code word for symbol s1 is one one. The code word for s2, I start at the root so I have an one followed by a zero. The code word for s3 is, I start at the root so I go through a zero and then a one. S4 is zero, zero, one. And finally s5 is zero, zero, zero. So this is the Huffman code for this particular source. It's clear that it is a prefix code, because the code words are the leaves of the tree. So, I don't need to go through one code word to reach another code word due to the construction, it's a prefix code construction and therefore, this is a uniquely decode-able code. We see that due to the specific construction, that two symbols that have the smallest probabilities are going to have code words with the same length. Also clearly due to the construction, the most frequently encountered symbols, the ones with the highest probability are the ones that will be combined last. Therefore they'll be closer to the root of the tree, and therefore they're going to have shorter code words than, the ones that have smaller probabilities and, and they were combined first. Natural interesting important question is whether the Huffman code is unique. If we think for a second the answer is no, for a number of reasons. One of the reasons is that at various stages we had to make choices. Like we wanted to combine the two symbols with the smallest probabilities, but we had three symbols with the same smallest probability. So right there we had to make a choice which of the two are going to combine first, and because of the choice had we taken the other path we would have ended up with a different Huffman Code. Also when I assign zero and ones to the binary tree arbitrarily I assign one to the left branch of the tree and zero to the right branch of the tree. So if I reverse the assignment, so if instead of one there is the Huffman code you replace the one by zero and the zero by one. We're going to have this complimentary code, which is also a valid Huffman code. So, because of these reasons, the Huffman code is not unique. However, in all cases it's a prefix code and therefore uniquely decode-able, and in all cases the average code word length is exactly the same. If you do the calculations you can find the the entropy of this source is approximately 2.122 bits per symbol. While the average code word length of the resulting Huffman code is about 2.2 bits per symbol. So the Huffman code is very, very close to the entropy, therefore it's a, it's a very efficient code. We show here and discuss some properties of the Huffman codes without proofs. The first one is shown here. I've been using rate, R, and average code word length in distinguishably. They mean exactly the same thing. So according to this expression, it's a, reiteration of the source coding theorem. The rate is bounded from below from the entropy and evolved from entropy plus 1. This is actually an expression I typically work out and show when I teach the, a graduate course on the topic. If now the maximum probability of a symbol is less than a half, then the upper bound is becoming tighter. So instead of one there we have the Pmax which in this case is less than a half. So it's a tighter bound, upper bound of under eight. If on the other hand the maximum probability of the symbols is greater than a half, then this how the upper bound changes. The best possible result is achieved when the probabilities are powers of one half. In such case, the rate becomes exactly equal to the entropy of the source. Now looking at these expressions here, it's clear that if the alphabet size is large then in general Pmax is small. So the upper bound is tight and therefore Huffman codes perform quite well. If, on the other hand, the alphabet size is small, and there's then a good chance in that case that Pmax is large, the Huffman code can be quite inefficient, so the upper bound is not as tight anymore, and that's the result. Let's look here at some of the properties of the Huffman code when block coding is used. We mentioned block coding earlier, so what happens in this case, I consider n symbols of the original source together to form an extend symbol. And this extended symbol is emitted by this extended source Sn. The source is called extended since it's alphabet increased in size from little n to n to the nth power. As already mentioned the entropy of this extended source is n times the entropy of the original source. So based on this when I consider the extended source we know that this expression holds true, that I mentioned a slide ago. So the rate of the extended source is between the entropy of the extended source and, entropy plus 1 as an upper bound. Now if I consider here a block that has n original symbols, and each symbol is represented by R, that's the rate per original symbol. Then, since there are n symbols here, altogether R of n is equal to n times R. So if in this expression I substitute R of n in here, and substitute H of S of n here and here, then I obtain this expression. So if I divide now everything by n, I end up with this expression for the rate of the source, the original source that is, it's bounded below by it's entropy as always, but above it's entropy plus 1 over n. So clearly as N increases, 1 over N vanishes. And therefore, using this method of the extend, extended source kind of guaranteed that I will obtain a Huffman code that will achieve entropy. As every thing in life and in engineering, there are tradeoffs. And what are the tradeoffs. When we do this, is that clearly the size of the alphabet here increases. So, for example, if I started with this tiny source that only has three symbols. But consider blocks of size eight, then we have 3 to the eighth or 6,500 codewords. So because of that, I have an increase in storage because we have to, to store this code book when we do the coding. In computation because I have to search now not just over three entries of the code book, but 6,561 entries. And there's a delay also because I decode eight in this particular example, n in general symbols at the time instead one symbol at a time. So let's see some examples of what is achieved when the source is extended, and cases for which the extension kind of works with small number n, capital N, and cases for which it does not work right away. Let us look at the first example here of the effect of extending the source on the Huffman code. Here is the example we'll work with, it's a source that emits three symbols. Here are the associated probabilities. Based on this information, we can follow the procedure we just described and design this Huffman code. Using this information about the source, we can compute its entropy, as well as we can compute the average code word length of the Huffman code. A quantity that is typically used, is the redundancy of the code, in other words, how far away is the average code word length of the code away from the entropy? So for this particular example, subtracting this from this, we see that this is the redundancy. And it's typically kind of normalized by referring to the percentage of this redundancy with respect to entropy. So the redundancy is 47% of the entropy for this particular example. So this is not a very efficient code. And we'd like to extend the source and then redesign the Huffman code. So for this particular example clearly n equals 3. We want to just extend it by using two symbols at a time. And therefore the new alphabet is going to have 3 to the 2nd 9 symbols. And these are the symbols we obtain by all possible combinations of the three symbols of the, original source. So here is, the alphabet of the extended source, and here are the associated probabilities. So, to form these probabilities I multiply the probabilities of the symbols. So S1S1 is 0.8 times 0.8, 0.64. S1S2 is 0.8 times 0.02 which is equal to this and so on. So having this extended source. We follow again the procedure and we, of the Huffman code, and here is the result in Huffman code. So, if we compute now the same statistics, for this extended source, we see that the average length of the code of this code for the extended source is this. So many bits per new symbol. And since the new symbol has two original symbols. I divide this by two. So the average code word length of this code, based on the extended source, equals this. Bits per original symbol. So now if we compare this to the entropy of the source, we see that we are only so many bits away per symbol from the entropy, or just 7% of the entropy. So this is a case that works by just considering just two symbols. Together, we see that we can bring the redundancy from 47% of the entropy, down to 7% of the entropy. And now this is an, efficient code, there's no need to further extend the source. Here's a second example of an extended source. Of course I'm giving the story away with the title, this is a case that is not going to work very well. So again, here is the source we'll work with. Item is three symbols, here are the probabilities and here's the Huffman code that we designed based on these statistics. The difference between actually this source and the one we used in the previous slide is that the probability of S2 stayed the same, but the probability of S3, with respect again to the previous case was reduced considerably, by 0.15, so 0.15 was added to the probability of the first symbol. So therefore this is a, a source with a skewed probability of its symbols. So one symbol has a very high probability and the other a very small one. So if you recall from a couple of slides ago, this is a case that for which you were arguing that Huffman can be an inefficient code. It's the case of a small alphabet this of course the toy example is that if it's small and Pmax greater than half. So, if you recall in that case of Pmax greater, equal than 0.5 the average code word length of the rate, I use this to. Indistinguishably is between it's, I'm sorry, less than entropy plus Pmax plus 0.086. So with Pmax 0.95 this here is greater than one so the upper bound becomes a rather loose one. And there's, you know, a lot of room for improvement. So for this particular source and this particular code, if we pull out the statistics, we see that this is the value of the entropy. This is the value for the average code word length and therefore this is the redundancy. And percentage-wise, we are 213% away from entropy. So this is a very inefficient code and we would like to improve its performance by extending, extending the source. So if we do exactly the same thing as in the previous example, and we consider n equals 2. Then, here is the alphabet of the new source, the associated probabilities, and here is the resulting Huffman code. If we, perform the statistics again, this is the average code word length of the extended source, so, so many bits per new symbol. I divided by two, so the rate is so many bits per original symbol. And the redundancy has dropped, but has dropped only to 72% of entropy. So again we went from 213 to 72, so there's definitely a benefit in extending the source, but 72% makes it still a very inefficient code. And for this particular example, you start getting, you end up getting to an acceptable level of redundancy by considering N equals 8 and as already mentioned for this particular example, the size of the alphabet is quite huge in this particular case.