Arithmetic coding is discussed in this segment which addresses some of the shortcomings of Huffman coding. According to it, a unique identifier, or a tag is generated for a particular sequence of symbols, without a need to generate all possible code words for sequences of the same length, as well the case for Huffman encoding. We show how this tag is generated by mapping each symbol from the real line segment from zero to one, and keep zooming into the segment as more and more symbols are observed. Arithmetic coding is also suitable for dealing with short dictionaries, as is the case with binary fax images. And also for taking context into account, therefore locally changing the probability of the symbols. Because of this, it's a code chosen by the Joint Bi-level Image Processing Group, JBIG, a group of experts that established the standard for the progressive transmission of bi-level images. The basic principles of JBIG are also described in this segment. So, let us now have a closer look at this material. We saw that it is more efficient to generate codewords for groups, or sequences of symbols, instead of generating code words for individual symbols. And we call this block coding. In terms of Huffman block coding, we saw that if we want to encode a particular sequence of length m, then we have to find the codewords for all possible sequences of the same length. And this clearly has the major disadvantage that the size of the alphabet grows tremendously. So if I have an alphabet of size n and extend the source by m, then the new alphabet of the extended source has size n to the m. We would like clearly to address this, and the way to address it is through arithmetic coding. So with arithmetic coding we can assign a codeword to a particular sequence without having to generate codes for all sequences of that same length. The way it is done is by finding a unique identifier, or a tag, for this particular sequence we want to encode. And this tag corresponds to a binary fraction which becomes the binary code for this sequence. So would like to separate the generation of the tag and the generation of the binary code. We'll just discuss at some high level how the tag is generated. But we will not get into any of the details of how the code is generated. It's a rather complicated matter. One possible set of tags for representing sequences of symbols are the numbers in the interval zero to one. Because there are infinitely many real numbers in this interval, it should be possible to assign a tag to any sequence we're interested in. In order to do so, we need the function that will map sequences of symbols into the unit interval. Such a function is the cumulative distribution function. So we are given an alphabet here with little m symbols, and here are the associated probabilities. We know that a random variable maps the outcomes of an experiment to the real line, so we'll use this random variable here, X, that will map the outcome of the experiment, the element of the alphabet, here, the symbols into the index of the symbol i. So now given the probability of the source we're given the probability that's the function of this random variable so the probability that other variable is equal to y is simply the probability of the symbol ai. And of course we can have the cumulative distribution function, Fx of i. The sum k1 to i of these probabilities that another variable is equal to k. With this assignment, the main idea behind the type generation is to review the size of the interval in which the tag resides, as more and more elements of the sequence are received. So we partition the unit interval into sub-intervals of this form. So this is the line from 0 to 1, and symbol a of 1. It resides in this interval from the cumulative at 0 and the cumulative at 1. And more specifically, a of i belongs to this interval so it's closed at the lower sign and it's open at the upper sign. So the appearance of the first symbol in the sequence restricts the interval containing the tag to one of the sub-intervals. So let's assume that the first symbol is a2, then the tag will be in this interval here and this sub-interval is partitioned in exactly the same proportions as the original interval. So each succeeding symbol closes the tag to be further restricted to a sub-interval but is further partitioned in the same way. So let's look at an example where hopefully this idea will become even more clear. Let us now look at the simple example of a tag generation. We're given a source with this alphabet. And here are the associated probabilities of the symbols. Based on what I just discussed in the previous slide. Through the assignment of a random variable we can find the cumulative distribution function of that random variable, and we have that Fx(1)= P(a1). Fx(2)=P(a1) + P(a2)= 0.8. And Fx(3) =1. So with these values of the cumulative distribution function, we take now the segment between 0 and 1. And divided here is Fx (0) here is Fx (1), Fx (2 )and Fx( 3). So,a1 resides in this interval...a2 in this interval and a3 in this interval. Let us assume that the first symbol we observe is a1. This means that any number in this interval between 0 and 0.7 can serve as the tag to uniquely identify a1. I take the 0 to 0.7 sub-interval and subdivide it proportionally to the original probability. So 70% of it is occupied by a1 10% by a2 and 20% by a3. If the second symbol we observe is a2, then the tag resides in this interval. I can take this interval, subdivide it proportionally to the regional probabilities. If the third symbol I observe is a3, then the tag It resides in this interval. I can further subdivide these, and so on and so forth. So if the sequence we observe and we want to encode is a1, a2, a3. Then any real number in the interval 0.546 to 0.56 can serve as the tag to uniquely identify this particular sequence. It is clear that looking for an identifier just for this particular sequence and not for any three symbol element as were the case with Huffman encoding. Typically, the middle point of this interval is chosen as the tag. And also there are ways to evaluate the computer tag as well as to compute the boundaries of each of the subintervals each time I take a subinterval and further subdivide it and so on and so forth. So this is the basic idea.- Of how a tag is generated. Let us see with a simple example how to decipher the tag and find out what is the sequence that was encoded. So we consider again the same source as shown here and we're given that the tag is equal to 0.55 so we want to find again. What is the encoded sequence? We have, of course, in addition to the tag, been given the information of the number of symbols in the sequence. And we're given that there are three symbols that were encoded. The main idea in deciphering the tag, is to mimic they encode the so that we decode the tag, decipher the tag. So we take this information about this source, we take the segment between 0 and 1 and subdivide it proportionately as we did before. Now clearly the tag resides in this interval. If you recall the steps we followed while encoding, the first observed symbol sets the stage in some sense to where the tag is going to be logged. In other words, the tag cannot be logged to a segment outside the initial segment because in each step, we are subdividing the initial segment. So the five that are tagged for this particular example belongs to this interval. Means that the first symbol that was encoded is the symbol, a1. So we mimic the encoded. Take this interval. Subdivide it proportionally. And now we observe that the tag resides in this interval. Therefore the second symbol that was recorded was a2. Mimic the encoder, subdivide this, we see that the tag belongs to this interval therefore the third symbol that was.- Encoded, and we're decoding it now is a3. So, in sum and in conclusion, the decoded Sequence is a1a2a3. We would like to briefly compare the performance of arithmetic coding and Huffman. As we already saw, this is the relationship the Huffman rate satisfies. As a function of the encoding block size, N. The related relationship for arithmetic coding is given here. By comparing these two expressions, it seems that AC does not have as good of efficiency as Huffman coding seems two over N here goes to zero slower than one over N, however as we discussed, Huffman coding suffers from the explosion of the size of the dictionary, and in practice, it is arithmetic coding that achieves rates close to entropy. Of course, we did not discuss how the tag is encoded. It should be clear that as the size of the message N increases, the precision of the tag increases. We need to represent a real number with a lot of Decimal points. Scaling and incremental coding are used to addressed this issue. If the target is in the 0 to .5 interval, for example, it's clear that it's restricted to stay there, therefore the 0 to .5 range can be scaled back to the 0 to 1 range. Each, by doing so we lose the more significant beat, which however was already sent to the decoder by signaling which half of the line we are using. And this is called incremental coding. I want to demonstrate, with a simple example, the benefit of taking the context of a symbol being encoded into account. When encoding an image, and then using multiple code there's each of them adapting to the statistics of this context. The words context And content are both used in image and video processing but at times they're confused. By what I'm describing here, the content of an image is taken into account but we're adapting the code there to the context of the pixel or the symbol, in general, being processed. So, it's also content adaptive, the processing I'll describe next, but to be more precise we are looking again at the context of the pixel being encoded. This actually will become much clearer in the next few slides. We'll also talk later in the course about other examples of content adaptive processing. So we're looking at the document. Which has this probabilities of black and white pixels. The white pixels dominate, which typically is the case with a lot of documents. We know how to find the entropy of this source and it's equal to this. So many bits per pixel. So based on what you've learned, if the model here that was used to find the entropy of a discrete memory source as I create in describing this document, then we know that you can not find a coder that can have a performance below entropy. Now we divide the document into two regions. Region 1 is a larger one. It includes 80% of the pixels. And region 2 includes the remaining of the pixels. Region1 is dominated by white pixels, 95% of them are white. So if we find the entropy footage in region 1 is equal to the number shown here Region 2 is dominated by black pixels and the corresponding entropy is shown here. Now the total entropy of the source is the weighted sum of the two entropies of the regions where the weights are proportional to the number of pixels in it's region. So in other words the total entropy is 0.8H1, 0.2H2 and this is the resulting number. So what we see is that by dividing the documented two regions, the entropy has been reduced to almost a half of the original entropy. And therefore, the user multiple coders which adopt the context of the symbol being coded are beneficial, and this same idea has been used extensively, of course in a more refined form that I will explain later when meta-coders are used. I should explain here that this is a case Of a small alphabet, it starts with a small alphabet and skewed probabilities of symbols. So we know that, for this case, Huffman coding has low or problematic performance, while a source like this is very suitable for arithmetic coding. A case where context adaptive arithmetic coding is use is the JBIG standard. JBIG actually stands for Joint Bi Level Image Processing Group. So this is a group of experts that established the standard for the progressive transmission of bi level images. JBIG is therefore a combination of a progressive transmission and the lossless coding algorithm. Coding again is used for the last coding part, and the coding is suitable for the use of multiple coders since they all use the same machinery. They simply use a different set of probabilities, so JBIG encodes the pattern of bigs in the neighborhood bigs being encoded. That's the context to decide which set of probabilities to use in code and in particular pixel. So it uses these two neighborhoods, each of these neighborhoods has 10 pixels, so a in coding the pixel application X. And I look at these 10 labels. Here they extend over 3 lines. Or I use this pattern so for encoding this I look at these ten labels that extend only over 2 lines. Since there are 10 labels there are 2 to the 10 or 1024 different Context or different coders. Actually, JBIG is using as many 496 different coders when we deal with a high-resolution image. A here Is a floating member of the neighborhood, and it can be placed anywhere in the image to possibly uncover specific structures in the image. We also notice that the shape of the neighborhood allows progressive computability. That is the neighborhood includes previously encoded pixels. And therefore these pixels are available to both encoder and decoder so they can synchronize their operations. So if we look at a particular example of this first neighborhood. So we want to encode 0 and we look again at these 10 neighbors. We see that in this example 70% of the BIG cells have black, zeroes and 30% of the pixels are white corresponding to ones. So and arithmetic could dare that white has symbols that are going to be. Utilized in encoding the value of this 0. Similarly, if this were the situation of the neighborhood, so in encoding the value of 1 here. In this particular case, 60% of the pixels are black. And 40% white. So an arithmetic code there utilizing this probabilities would be utilized in encoding this particular pixel value. The idea behind progressive transmission is to represent an image at various resolutions. So a low resolution version of the image is first transmitted to destination, if the recipient is interested in this image, might request a higher-resolution version of the same image to be transmitted. It's the same idea of the thumbnails we see on the web. It's it's a tiny image. If interested, we click on it, and we see a higher version of the same image. Now one of the challenges when we deal with binary images is how to down-sample them. JBIG allows 2 x 2 factor down-sampling. So if we have 4 pixels here We're going to replace them by1 pixel. So if I take the average of this fall, the average is clearly 1/4. So one might think okay, we could compare this to a half, since this is less than a half, then this should be replaced by a 0. But clearly we have a difficulty when we have this situation because the average of the pixels is half and therefore its not clear now to resolve this situation if the down sampled version of this 4x4 neighborhood should be a 0 or a 1. To address this, JPEG is considering a scheme as shown here. So we're interested in finding the low-resolution value of X, and instead of using just the 4 neighbors, The higher resolution, right? We want to find the how to replace these 4 values with high resolution by 1 value. We are also utilizing additional values at higher resolution, these are the small case letters as well as the already previously computed. Values of the low-resolution image, the capital letters. Then the standard proposes the evaluation of this formula here, where again, all these labeled pixels that low and high res are taken into account, and if the value of this expression is greater than 4.5, x=1, otherwise x=0. I guess it's always a question where did this expression come from, and very often in standards there's a lot of thinking and experimentation that takes place before something is standardized. So this at the end, if one stares at it and tries to reason, it makes sense and this is how it's implemented in the standard. In progressive transmission, the decoder has already received the low-resolution image. And of course, this image is available at thenckholder, and therefore the lower res values can be used as context,one performance in coding of the hi-res image. The advantage now is that the received low resolution are in the past with respective time. But include information about the pixel that is being encoded, as well future values with respect to space over the high-res image. And therefore the context can be more informative in this case. Here are the 4 context templates that are utilized by JBIG. X, again, in red is the pixel value being encoded. We see that past values of the high-res image used. And these big Os are the low-resolution pixel values. A again is this floating location of the pixel. So see for example that in computing this O value of the load as the value we tried to encode of the high res was taken into account, as well as future values. And this is the case always for low resolution pixels. So in this template, it's this low resolution value that has included the value that we tried to encode of the high resolution image. Since we have to include 2 beads to indicate which of the our context templates we are using, we have therefore used 12 beads, and it is two to the 12. 4,096 different context