0:00

We're now ready to describe each of one of the blocks that built an image

compression algorithm and in particular, a JPEG.

And as we say, we are going to start with a symbol encoder.

After we have basically done everything, we have done the transform and the

quantization that we are going to explain in detail, we want to further compress.

So, quantization will provide us on compression that we want to further

compress. But we need to save whatever we have

produced here. We need to save with no error.

But we're still going to see how we can compress.

So, why can we achieve that? The basic idea is very simple.

And, of course, you can not teach image processing without showing that first, at

least once, this famous image that is called Lena.

That it, it used to be used a lot in image processing.

But now, because of the abundance of images, many other images are used.

But I want to show it at least once during these, these nine weeks.

And we might see the same image again a few more times.

So, the basic idea is that some pixels, some gray values in the image appear more

than others. And let's see that.

Here on the right, we have what's called a histogram.

The histogram basically counts how many times a particular gray value appears in

the image. For example, the pixel value, the pixel

with gray values, let's say, around 47, appears this amount of times.

So, we go around the image, we count how many times I see the number 47 and this

is the amount of times. Then I can go and see, for example, 200,

and I see this, this amount of time, about 700 times appears in the image.

So, we see that there are some pixels that appear a lot, there are some pixels,

let's say, zero or one don't appear at all.

And the basic idea that we are going to exploit is that we're going to try to

give to pixels, to pixels that appear a lot, very short course, and to pixel

values that don't appear so much, we're going to try to give very long cause or

longer course. Let's see how we do that.

Let's see an example of the use of this. So here, we have an image, and this is

actually the image that we saw in the previous video, the geometric image.

This is a particular image that has, there's only four different gray values

in that image, and they appear, and so the pixels, pixels with gray value 87

appear in 1/4 on the image, so 25%,, 25% or probabilty of 0.25.

And we also have 128 appearing almost half of the time in the image, and we

also have 186 and 255 appears very, very few times, everybody else just doesn't

appear in the image at all. So, the standard way to represent this

image is we have an array and for every pixel, we have a binary code with eight

bits. So, we are going to basically represent

87 with its binary representation. We are going to represent 128 with its,

its own binary representation and so on. So, at the end, we are basically spending

eight bits per pixel if we don't compress.

But let's assume that somehow I managed to basically construct a code that is

shown here, in a way that 87 is represented by the

binary code 01, 128 is represented by the binary code

one, 186 by the binary code 000 and 255 by the

binary code 001. That means that, for example, if I want

to, if I want to store or send the pixels 87, 87,

255, 128.

Instead of writing all these then, this again, for 255, this and for 128, this,

which will take me basically, 4 times 8, 32 bits and I'm going to write it in the

following way. I'm going to say, okay,

87 is 01. 87 again, 01.

255, 001.

128, 1. So, basically it took me two for seven,

eight bits instead of 32 to represent the same stream the same form of gray values.

Now, the decoder has this table, so the decoder will see 01, we'll go into that

pixel position and we'll put an 87. We'll see 01, we'll put again an 87.

We'll see 001, we say, oh, I need to put a 255, and so on.

And so, we have achieved compression. Let's see exactly how much compression we

can achieve. So basically, we do a very simple

computation. We say, okay, with probability zero

point, sorry,

5:10

with probability 0.25, I'm going to have length of 01,

a length of, of two. That's my, my code, 01.

With probability 0.47, I'm going to have length of 1 plus 0.47 times one.

With probability 0.25, so for one quarter of the image, I'm going to have length

three. And with very small probability, I'm also

going to have length three. And this is equal to 1 60,

1.81. Okay.

So, instead eight bits per pixel, I now have 1.81 on average on my whole image.

I basically, I have save lots of pixels. Okay, I have gone from eight to basically

1.81. And that's a, that's a compression of

over four times, and that's what we have done.

Basically, what we did is we have exploited the fact that some of the pixel

values appear more often than others, and then I have to give shorter codes.

I can give shorter codes. So, how we do that, we do that using

something which is called Huffman coding. And I'm going to explain next how to do

it. So, lets just look once again in this

particular example, the best way to explain Huffman coding is just to use an

example, and I'm going to that next. So, let's assume that we have five

different symbols, six different symbols, sorry.

And I order them, the first thing they have to do is order them with, according

to the probability. The highest probability first and the

lowest probability at the end. And if some probabilities are equal, it

doesn't matter which one we put first. So, for example, this symbol a2 might be

the pixel value 100. The symbol a6 might be the pixel value

77, and so on. So, I'm going to add,

basically I'm going to list the prob, the symbols here and the probabilities here.

And from the bottom, I start to add. I take the two lowest.

And basically, I add their probabilities. So, I go from 0.04 plus 0.06 and that's

probability one, 0.1. And then, I do the list again.

So, none of these has changed, the only one that changed is the last one.

See what's happening here. What's happening here is with probability

0.4, I can get a2, with probability 0.3, I can get a6, with probability 0.1, I can

get a1, with probability 0.1, I can get symbol

a4, and with probability 0.1, I can get either a3 or a5.

I am going to have to do something to distinguish which one is it, is it a3 or

a5? And that's going to come in the next slide how we distinguish them.

Now, we do it, we do this again.

We take the two lowest and then we add them.

And now, because it became 0.2, remember, we have to reorder all the time.

So, let me just describe that again. With 0.4, we can get a2.

With 0.3, we can get a6. Now, with 0.2, we can get either one of

these two, which is a4, a3 or a5.

I don't know which one but I know with 0.2 probability is one of them.

And then, we have 0.1 again. We do that once more, once more,

actually, two more times. We have the two lowest.

We get this and then we, once again, add it to lowest, and we get this.

And once again, we, every time we add it to lowest and we move into the next

column, we'll reorder everything. When we end up with only two, we stop.

And we're going to see why we stop. Now, it's time to give the code.

So, basically, what I have done here, I basically redraw the graph that we have

in the previous slide. And so, this is the first column, this is

the second column, and the third column, the fourth column, and the fifth column.

And now, I'm going to go backwards, assigning codes.

So, basically to 0.6, I assigned zero. And to 0.4, I assigned one.

Before we go backwards, let me tell you what's happening.

When the decoder is going to see a zero, it's going to know that it's from the

group that led to this 0.6. When you see sub one, it's going to know

that it's from the group that leads to, lead to this 0.4.

Then we're going to have to keep adding bits to say which one in that group.

And that's why we keep going backwards. So, we basically propagate back the zero.

And we actually, the 0.6 came from 0.30.3, plus 0.3.

So, we add a zero and a one. So now we, of course, propagated 0.4 here

and basically assigned it to this that came to this column, without adding any

numbers. So, it keeps propagating without adding

any bits. So now, all the symbols that lead to this

group and we're going to see, it was actually only one symbol,

we have this code and all the symbols that propagated to this probability.

We have symbols that start, we have codes that start from 01.

So now, we keep propagating. We propagate the one to the one, the 00

to the 00. 01, we have to split because this 0.3

probability came from the addition of these two.

And then, we have 010, 011. And then we keep going.

The, we propagate the one to the one, the zero to the zero, this 010, which has a

code 011 propagates here. And this actually has to split.

Every time we split because when we were computing it, we were adding.

So, when we're going back, we were splitting.

Every time we are splitting, we have to add zero and one.

And so, we keep going, we keep going, and we get this result that says that the

symbol 01 is going to have a code, one. The symbol 06 is going to have a code 00

and so on. So, this is the code,

it's a variable length code. Not every symbol is represented by a code

of the same length. It actually has a different length and

the length depends on the probability. The higher the probability, the shorter

the code. And that's very easy to see because the

low probabilities were added, added, added, added and then, when we're going

back we have to split, split, split, split.

And remember, every time we split, we have to add a symbol, and we have to add

a bit for that symbol. So, basically the code becomes longer and

longer. And we can actually go and compute, once

again, the average length of these. So, to repeat the process, we basically

order the symbols according to the probability,

that's what we did here. We start adding low probabilities.

So, we add these two, we got this, we reorder and we keep going.

It's a recursive process so if you know how to go from one column to the next,

you know how to go all the way. And then, we come back.

There is a couple of things that are very important to observe here.

The code that you get in this way is what's called prefix-free.

And that's very, very important. Look at this column.

When you see a one, when the decoder sees a one, it knows that it has to go here,

to a2. When is this a zero, it basically says,

okay, I have to wait for the next symbol. But there is no code here, which is a

full prefix of any other code. And that's very important, otherwise, the

decoder won't be able to reconstruct. Let's give an example of something which

is not a prefix code. Let's say that the symbol a1 is

represented by the code one, the symbol a2 is represented by the code zero, and

the symbol a3 is represented by the code 01,

okay? So, when the decoder sees 01,

basically there are two possibilities. It's either a3,

let's say, pixel value 100 or is a2 followed by a1.

Let's say, pixel zero and pixel 255. So, there is a lot of ambiguity in the

reconstruction. We don't know which one it is.

And this is because this code is not what's called prefix-free.

This guy is a subset of this guy. That doesn't happen here.

Let's just clear these annotations. None of these codes,

so for example, the 00, you don't find it anywhere here.

So, when the decoder sees a one, knows what to do.

It puts the pixel value of A2. When it sees a zero.

Is it, oh, I'll have to wait, because there are many codes that starts with

zero. If this is a zero, another zero and I'll

say, oh, I'm done, I basically have a6. If, let's say, it's, so, the decoder

receives zero and then one, says, oh, I have to wait, because there are numerous

codes that start with 01. But then, it receives an, a one.

that says, oh, I'm done. I don't have to keep waiting.

Because there's nothing. There's no ambiguity here.

And that's very important, a Huffman code by construction.

Look how we basically, split, and split, and split.

And by construction, this is a prefix-free code.

Now, how do we know that this is an optimal code?

How do we, how do we know that with this set of probabilities,

the process, the procedure I just explained achieved the lowest possible

the, the lowest possible length, the shortest possible length of a code on

average. So, that's a theorem we have to prove,

we're not going to do that in this class, but as a theorem that, that shows the

Huffman coding basically achieves the shortest possible length.

Now, you might wonder what length can Huffman coding achieve or any basically

encoder of this tile. So, the basic idea is if I give you a set

of probabilities, how much can I compress?

How much, I go, and before I construct a Huffman code, I want to know, is it worth

the effort of constructing the code? Will I compress a lot?

And the way, the way to do that, is basically you compute what's called the

entropy. And the entropy,

which is written by H, H is entropy,

[SOUND] is the sum over all the symbols with a minus here, I'm going to explain

in a second, of the probability, of every symbol log base 2 of the probability of

every symbol. So, it's basically sum, 0.4 times log

base two of 0.4 plus 0.3 times log base two of 0.3 and so on.

So, this is actually, at the end, a positive number because all probabilities

are below one and therefore, the log of the probabilities are negative numbers

and then with the negative, we get the positive.

And Shannon's First theorem basically says, that we can achieve basically,

asymptotically, we can achieve this code length on average.

So, this is basically what's achievable with these types of codes, like Huffman

code. We basically can compute this number and

say, oh, that's going to be my average code length.

My expected average code length. And the, it's actually not so hard to

understand where this formula comes from. We used a similar computation with the

previous example. Here, the probability is basically the

probability of the symbol appearing. And this is actually the length of the

code that is going to be used for that symbol.

So basically, what this is saying is that the expected length of that code is the

log of the probability. And why it's only the expected is because

you cannot have a code which is 3.2 long. You have to send entire bits.

Either you send three bits or you send four bits.

And that's why this is on average the expectation,

what you expect to be the code length. But this basically gives you an idea of

what encoders at Huffman coding can achieve.

So, now that we know how to encode the bits that we are producing, we are ready

to go into the next blocks of image compression.