0:00

Arithmetic coding is discussed in this segment which addresses

some of the shortcomings of Huffman coding.

According to it, a unique identifier, or a tag is generated for

a particular sequence of symbols, without a need to generate all possible code

words for sequences of the same length, as well the case for Huffman encoding.

We show how this tag is generated by mapping each symbol from

the real line segment from zero to one, and

keep zooming into the segment as more and more symbols are observed.

Arithmetic coding is also suitable for

dealing with short dictionaries, as is the case with binary fax images.

And also for taking context into account,

therefore locally changing the probability of the symbols.

0:51

Because of this, it's a code chosen by the Joint Bi-level Image Processing Group,

JBIG, a group of experts that established the standard for

the progressive transmission of bi-level images.

1:53

And this clearly has the major disadvantage that the size of the alphabet

grows tremendously.

So if I have an alphabet of size n and extend the source by m,

then the new alphabet of the extended source has size n to the m.

We would like clearly to address this,

and the way to address it is through arithmetic coding.

So with arithmetic coding we can assign a codeword to a particular

sequence without having to generate codes for all sequences of that same length.

3:08

One possible set of tags for representing sequences of symbols

are the numbers in the interval zero to one.

Because there are infinitely many real numbers in this interval,

it should be possible to assign a tag to any sequence we're interested in.

3:37

So we are given an alphabet here with little m symbols,

and here are the associated probabilities.

We know that a random variable maps the outcomes of an experiment to

the real line, so we'll use this random variable here, X,

4:05

the alphabet, here, the symbols into the index of the symbol i.

So now given the probability of the source we're given the probability

that's the function of this random variable so the probability that

other variable is equal to y is simply the probability of the symbol ai.

And of course we can have the cumulative distribution function, Fx of i.

The sum k1 to i of these probabilities that another variable is equal to k.

With this assignment, the main idea behind the type generation

is to review the size of the interval in which the tag resides,

as more and more elements of the sequence are received.

So we partition the unit interval into sub-intervals of this form.

4:59

So this is the line from 0 to 1, and symbol a of 1.

It resides in this interval from the cumulative at 0 and the cumulative at 1.

And more specifically, a of i belongs to this interval so

it's closed at the lower sign and it's open at the upper sign.

So the appearance of the first symbol in the sequence

restricts the interval containing the tag to one of the sub-intervals.

So let's assume that the first symbol is a2, then the tag will be in this

interval here and this sub-interval

is partitioned in exactly the same proportions as the original interval.

5:49

So each succeeding symbol closes the tag to be

further restricted to a sub-interval but is further partitioned in the same way.

So let's look at an example where hopefully this idea will become

even more clear.

Let us now look at the simple example of a tag generation.

6:12

And here are the associated probabilities of the symbols.

Based on what I just discussed in the previous slide.

Through the assignment of a random variable we can find

the cumulative distribution function of that random variable,

and we have that Fx(1)= P(a1).

Fx(2)=P(a1) + P(a2)= 0.8.

And Fx(3) =1.

6:43

So with these values of the cumulative distribution function,

we take now the segment between 0 and 1.

And divided here is Fx (0) here is

Fx (1), Fx (2 )and Fx( 3).

So,a1 resides in this interval...a2 in

this interval and a3 in this interval.

7:32

I take the 0 to 0.7 sub-interval and

subdivide it proportionally to the original probability.

So 70% of it is occupied by a1 10% by a2 and 20% by a3.

If the second symbol we observe is a2,

then the tag resides in this interval.

I can take this interval,

subdivide it proportionally to the regional probabilities.

If the third symbol I observe is a3, then the tag It resides in this interval.

I can further subdivide these, and so on and so forth.

So if the sequence we observe and we want to encode is a1, a2, a3.

8:17

Then any real number in the interval 0.546 to

0.56 can serve as the tag to uniquely identify this particular sequence.

It is clear that looking for an identifier just for

this particular sequence and not for any three symbol

element as were the case with Huffman encoding.

Typically, the middle point of this interval is chosen as the tag.

And also there are ways to evaluate the computer tag as

well as to compute the boundaries of each of the subintervals

each time I take a subinterval and further subdivide it and so on and so forth.

So this is the basic idea.- Of how a tag is generated.

Let us see with a simple example how to decipher the tag and

find out what is the sequence that was encoded.

So we consider again the same source as shown here and

we're given that the tag is equal to 0.55 so we want to find again.

What is the encoded sequence?

We have, of course, in addition to the tag,

been given the information of the number of symbols in the sequence.

9:59

So we take this information about this source, we take

the segment between 0 and 1 and subdivide it proportionately as we did before.

Now clearly the tag resides in this interval.

If you recall the steps we followed while encoding, the first observed

symbol sets the stage in some sense to where the tag is going to be logged.

In other words, the tag cannot be logged to a segment outside

the initial segment because in each step, we are subdividing the initial segment.

So the five that are tagged for

this particular example belongs to this interval.

Means that the first symbol that was encoded is the symbol, a1.

So we mimic the encoded.

Take this interval.

Subdivide it proportionally.

And now we observe that the tag resides in this interval.

10:59

Therefore the second symbol that was recorded was a2.

Mimic the encoder, subdivide this, we see that the tag belongs to this interval

therefore the third symbol that was.- Encoded, and we're decoding it now is a3.

So, in sum and in conclusion, the decoded Sequence is a1a2a3.

We would like

to briefly compare the performance of arithmetic coding and Huffman.

As we already saw, this is the relationship the Huffman rate satisfies.

As a function of the encoding block size, N.

11:40

The related relationship for arithmetic coding is given here.

By comparing these two expressions, it seems that AC does not have as good of

efficiency as Huffman coding seems two over N here

goes to zero slower than one over N, however as we discussed,

Huffman coding suffers from the explosion of the size of the dictionary, and

in practice, it is arithmetic coding that achieves rates close to entropy.

Of course, we did not discuss how the tag is encoded.

It should be clear that as the size of the message N increases,

the precision of the tag increases.

We need to represent a real number with a lot of Decimal points.

Scaling and incremental coding are used to addressed this issue.

If the target is in the 0 to .5 interval, for

example, it's clear that it's restricted to stay there,

therefore the 0 to .5 range can be scaled back to the 0 to 1 range.

Each, by doing so we lose the more significant beat, which however was

already sent to the decoder by signaling which half of the line we are using.

And this is called incremental coding.

I want to demonstrate, with a simple example,

the benefit of taking the context of a symbol being encoded into account.

When encoding an image, and then using multiple code there's

each of them adapting to the statistics of this context.

13:29

By what I'm describing here, the content of an image is taken into account but

we're adapting the code there to the context of the pixel or the symbol,

in general, being processed.

13:40

So, it's also content adaptive, the processing I'll describe next, but

to be more precise we are looking again at the context of the pixel being encoded.

This actually will become much clearer in the next few slides.

We'll also talk later in the course about other examples of content

adaptive processing.

So we're looking at the document.

Which has this probabilities of black and white pixels.

The white pixels dominate, which typically is the case with a lot of documents.

We know how to find the entropy of this source and it's equal to this.

So many bits per pixel.

So based on what you've learned, if the model here that was used to find

the entropy of a discrete memory source as I create in describing this document,

then we know that you can not find a coder that can have a performance below entropy.

Now we divide the document into two regions.

14:39

Region 1 is a larger one.

It includes 80% of the pixels.

And region 2 includes the remaining of the pixels.

Region1 is dominated by white pixels, 95% of them are white.

So if we find the entropy footage in region 1 is equal to the number shown here

Region 2 is dominated by black pixels and the corresponding entropy is shown here.

Now the total entropy of the source is the weighted sum

of the two entropies of the regions

where the weights are proportional to the number of pixels in it's region.

So in other words the total entropy is 0.8H1,

0.2H2 and this is the resulting number.

So what we see is that by dividing the documented two regions,

the entropy has been reduced to almost a half of the original entropy.

And therefore, the user multiple coders which adopt the context

of the symbol being coded are beneficial, and

this same idea has been used extensively, of course in a more

refined form that I will explain later when meta-coders are used.

I should explain here that this is a case Of a small alphabet,

it starts with a small alphabet and skewed probabilities of symbols.

So we know that, for this case, Huffman coding has low or problematic performance,

while a source like this is very suitable for arithmetic coding.

A case where context adaptive arithmetic coding is use is the JBIG standard.

JBIG actually stands for Joint Bi Level Image Processing Group.

So this is a group of experts that established the standard for

the progressive transmission of bi level images.

JBIG is therefore a combination of a progressive transmission and

the lossless coding algorithm.

16:45

Coding again is used for the last coding part, and the coding is suitable for

the use of multiple coders since they all use the same machinery.

They simply use a different set of probabilities, so

JBIG encodes the pattern of bigs in the neighborhood bigs being encoded.

That's the context to decide which set of probabilities to use in code and

in particular pixel.

So it uses these two neighborhoods,

each of these neighborhoods has 10 pixels, so a in coding the pixel application X.

And I look at these 10 labels.

Here they extend over 3 lines.

Or I use this pattern so for

encoding this I look at these ten labels that extend only over 2 lines.

Since there are 10 labels there are 2 to the 10 or

1024 different Context or different coders.

Actually, JBIG is using as many 496 different coders

18:10

We also notice that the shape of

the neighborhood allows progressive computability.

That is the neighborhood includes previously encoded pixels.

And therefore these pixels are available to both encoder and

decoder so they can synchronize their operations.

So if we look at a particular example of this first neighborhood.

So we want to encode 0 and

we look again at these 10 neighbors.

We see that in this example 70% of the BIG cells have black,

zeroes and 30% of the pixels are white corresponding to ones.

So and arithmetic could dare that white has symbols that are going to be.

Utilized in encoding the value of this 0.

Similarly, if this were the situation of the neighborhood,

so in encoding the value of 1 here.

In this particular case, 60% of the pixels are black.

And 40% white.

So an arithmetic code there utilizing this probabilities would be

utilized in encoding this particular pixel value.

The idea behind progressive transmission is to represent an image at

various resolutions.

So a low resolution version of the image is first transmitted to destination,

if the recipient is interested in this image,

might request a higher-resolution version of the same image to be transmitted.

19:55

It's the same idea of the thumbnails we see on the web.

It's it's a tiny image.

If interested, we click on it, and we see a higher version of the same image.

Now one of the challenges when we deal with binary images is how to

down-sample them.

JBIG allows 2 x 2 factor down-sampling.

So if we have 4 pixels here We're going to replace them by1 pixel.

So if I take the average of this fall, the average is clearly 1/4.

So one might think okay, we could compare this to a half,

since this is less than a half, then this should be replaced by a 0.

But clearly we have a difficulty when

20:40

we have this situation because the average of the pixels is half and

therefore its not clear now to resolve this situation if

the down sampled version of this 4x4 neighborhood should be a 0 or a 1.

To address this, JPEG is considering a scheme as shown here.

So we're interested in finding the low-resolution value of X,

21:06

and instead of using just the 4 neighbors, The higher resolution, right?

We want to find the how to replace these 4 values with high resolution by 1 value.

We are also utilizing additional values at higher resolution,

these are the small case letters as well as the already previously computed.

Values of the low-resolution image, the capital letters.

21:36

Then the standard proposes the evaluation of this formula here, where again,

all these labeled pixels that low and high res are taken into account,

and if the value of this expression is greater than 4.5,

x=1, otherwise x=0.

I guess it's always a question where did this expression come from,

and very often in standards there's a lot of thinking and

experimentation that takes place before something is standardized.

So this at the end, if one stares at it and tries to reason,

it makes sense and this is how it's implemented in the standard.

22:27

In progressive transmission,

the decoder has already received the low-resolution image.

And of course, this image is available at thenckholder, and therefore the lower res

values can be used as context,one performance in coding of the hi-res image.

The advantage now is that the received low resolution

are in the past with respective time.

But include information about the pixel that is being encoded,

as well future values with respect to space over the high-res image.

And therefore the context can be more informative in this case.

Here are the 4 context templates that are utilized by JBIG.

X, again, in red is the pixel value being encoded.

We see that past values of the high-res image used.

And these big Os are the low-resolution pixel values.

A again is this floating location of the pixel.

23:28

So see for example that in computing this O value of the load as the value we tried

to encode of the high res was taken into account, as well as future values.

And this is the case always for low resolution pixels.

So in this template, it's this low resolution value that has included

the value that we tried to encode of the high resolution image.

Since we have to include 2 beads to indicate which of the our context

templates we are using, we have therefore used 12 beads, and it is two to the 12.

4,096 different context