We can do better than that.

We still use a DMS model, but now we go

and measure the probabilities of appearance of each individual letter.

And in this case, the estimate of

the entropy becomes smaller, 4.14 bits per letter.

We still know, however, that this is not a

very accurate model, because there are dependencies amongst letters.

There are combinations of letters that are more probable than

others and combinations that do not exist in the English language.

So therefore, if one looks at these dependencies

by considering up to eight letters, then this

is the estimate of the source, of the

entropy of the source, that one can obtain.

So this is the most accurate or one of the

most accurate estimates of the source of the English language.

And therefore, this is the number one should try to come

close to in designing efficient codes for encoding the English language.

In general, the more we know about the structure of the

data, the better our estimate of the entropy is going to be.

We're going to demonstrate this with two simple toy examples.

So assume we have this stream here of symbols.

So this source has six symbols in it's alphabet.

In this stream we go and measure the appearance of two and four.

So it's two out of ten.

Therefore the probability of two is 0.2 and so it

is a probability of four and so on and so forth.

We use the formula of the entropy for a DMS source

and we find that in this particular case this is the entropy.

We need 2.44 bits per symbol.

But the way here, since they have six symbols if I, if I were to use a fixed

code then I would need three bits per symbol because two to the third is eight.

So with three bits I can express eight different codewords.

Here I have only six symbols but using a variable length code, in

principle, I could go closer to the entropy of 2.44 bits per symbol.

Now, if we stare at this particular stream here for, for

a little, we can possibly see some structure in it and to

express the structure we propose a prediction model.