1:42

This is illustrated in the figure, in which the first codeword, tilde{X}(1)

is sent through the channel, to obtain the received sequence Y.

[BLANK_AUDIO]

Now for w from 1 up to M, define the

event, E_w, equals the event, that tilde{X}(w)

and Y are jointly delta typical, that is the codeword

for message w, and received sequence Y are jointly delta typical.

[BLANK_AUDIO]

Now observe that if E_1 occurs but, E_w does not occur, for

all w from 2 up to M, then no decoding error would occur,

because, according to our decoding rule, the received

sequence, Y, would be decoded to message 1.

Therefore, the probability of E_1 intersect E_2 complement, intersect E_3

complement,

intersect all the way to E_M complement, given that W is equal to 1, is upper

bounded by the probability that the error event does

not occur, given that W is equal to 1.

[BLANK_AUDIO]

Now consider the probability of the error event, given that W is equal to 1.

This is equal to 1 minus the probability that the error

event does not occur, given that W is equal to 1.

Using the above inequality, we obtain the upper bound

1 minus the probability of E_1 intersect E_2 complement,

intersect E_3 complement, intersect all the way to E_M complement,

given that W is equal to 1.

[BLANK_AUDIO]

This is simply equal to the probability of the complement, of

E_1 intersect E_2 complement, intersect E_3 complement,

intersect all the way to E_M complement, given that W is equal to 1.

Now by means of de Morgan's Law, we can write this

event as E_1 complement union E_2, union E_3,

union all the way to E_M.

[BLANK_AUDIO]

Then by the union bound, the probability of the error event, given that

W is equal to 1, is upper bounded by the probability of

E_1 complement, given W equals 1, plus summation w equals

2 up to M, the probability of E_w, given W is equal to 1.

[BLANK_AUDIO]

By the strong joint AEP, the probability of E_1 complement,

given W is equal to 1, that is the probability that tilde{X}(1), the

first codeword, and the received sequence Y, are not jointly delta typical

given that W is equal to 1, is less than some small quantity nu

that goes to 0,

as delta goes to 0.

[BLANK_AUDIO]

This is illustrated in the figure.

[BLANK_AUDIO]

Now conditioning on W equals 1, that is, the first codeword

has been sent, for w equals 2 up to M, tilde{X}(w)

and the received sequence Y, are n i.i.d. copies

of the pair of generic random variables, X prime, Y

prime, where X prime has the same distribution as

X, and Y prime has the same distribution as Y.

5:33

It is because all the codewords are generated in exactly the same way,

and so they have the same distribution.

Namely, p(x) to the power n.

[BLANK_AUDIO]

Since a DMC is memoryless, X prime and Y prime are independent.

[BLANK_AUDIO]

The idea here is that the codeword tilde{X}(1) is independent

of all of the codewords tilde{X}(2), up to tilde{X}(M).

Because the received sequence Y is obtained through the transmission of the codeword

tilde{X}(1), intuitively, tilde{X}(2) up tilde{X}(M), are independent of Y.

[BLANK_AUDIO]

Now for w equals 2 up to M, the probability of

E_w, given W equals 1, is equal to the probability that

the codeword tilde{X}(w), and the received sequence

Y, are jointly delta typical given that W is equal to 1.

[BLANK_AUDIO]

By Lemma 7.17, this is upper bounded by 2 to the power minus n, times the mutual

information between X and Y, minus tau, where tau goes to 0, as delta goes to 0.

[BLANK_AUDIO]

Note that our choice of M satisfies 1 over n times log M, is less than

the mutual information between X and Y, minus epsilon over 4.

And this implies that M, the total number of messages, is

less than 2 to the power n times I(X;Y), minus epsilon over 4.

[BLANK_AUDIO]

Therefore, the probability of the error event, which is equal

to the probability of the error event, given that W is equal

to 1, is upper bounded, by the probability of E_1

complement, given W equals 1, which is less than nu,

8:22

And so, we have nu plus 2 to the power minus n, times epsilon over 4, minus tau.

[BLANK_AUDIO]

Now let us take a close look at this term, 2

to the power minus n, times epsilon over 4, minus tau.

[BLANK_AUDIO]

Recall that epsilon is fixed.

Since tau tends to 0, as delta tends to 0, we can choose delta to be

sufficiently small, so that epsilon over 4, minus tau, is greater than 0.

[BLANK_AUDIO]

If so, 2 to the power minus n, times epsilon over 4,

minus tau would tends to 0, as n tends to infinity.

[BLANK_AUDIO]

Now let nu be less than epsilon over 3, which is so for a sufficiently small delta,

and let n be sufficiently large, so that 2 to the power minus

n times epsilon over 4, minus tau, is less than epsilon over 6.

[BLANK_AUDIO]

Then we obtain the probability of the error event, less than epsilon

over 3, plus epsilon over 6, which is equal to epsilon over 2.

Thus, we have shown, that with a suitable choice

of parameters, for the random coding scheme, the probability

of the error event, is less than epsilon over

2, for any arbitrarily small epsilon, when n is sufficiently large.

[BLANK_AUDIO]

The idea of the analysis is the following.

First of all, let the block length n be large.

[BLANK_AUDIO]

Then by the joint AEP, the probability that tilde{X}(1), that is the codeword

being sent, is jointly typical with the received sequence Y, would tend to 1.

[BLANK_AUDIO]

At the same time, the probability that any

other codeword is jointly typical with the received sequence, Y,

is approximately equal to 2 to the power minus

n times the mutual information between X and Y.

[BLANK_AUDIO]

If the size of the codebook, that is M, grows at

a rate strictly less than the mutual information between X and Y,

then the probability that a codeword tilde{X}(w) jointly typical with the

sequence Y, for some W not equal to 1, can be made arbitrarily small.

[BLANK_AUDIO]

Then the probability of the error event, that is W hat

not equal to W, can be made arbitrarily small.

[BLANK_AUDIO]

We have already shown that for the random coding scheme,

the probability of the error event can be made arbitrarily small.

We now show the existence of a

deterministic coding scheme, that can do the same.

[BLANK_AUDIO]

According to the random coding scheme, the probability of the error

event, can be written as the summation over all codebook C,

the probability of the codebook C being chosen, times

the probability of the error event for that codebook C.

In other words, the probability of the error event is

a weighted average of the probability of the error event

over all the codebooks.

[BLANK_AUDIO]

Then there exists at least one codebook C

star, such that the probability of error P_e,

is less than or equal to the probability

of error, weighted over all the possible codebooks,

which we have shown, is less than epsilon over 2.

And by construction, this codebook has rate

1 over n, times log M, greater than I(X;Y),

minus epsilon over 2.

By letting p(x) be the input distribution, that achieves the channel capacity,

the rate of this code can be arbitrarily close to the channel capacity.

[BLANK_AUDIO]

To complete the proof of the achievability of

the channel coding theorem, we are now going to

show that from this deterministic code, with P_e, the

average probability of error, less than epsilon over 2,

we can construct another deterministic code whose maximal conditional

probability of error, lambda_max, is less than epsilon.

Without loss of generality,

assume that the conditional probability of errors are in ascending order.

That is lambda_1, less than or equal to lambda_2,

less than or equal to all the way to lambda_M.

From P_e less than epsilon over 2, we have 1 over M, times summation

w equals 1 up to M, lambda_w, less than epsilon over 2.

This implies that the summation of all the lambda_w's

is less than M over 2, times epsilon.

[BLANK_AUDIO]

Since M is even, M over 2 is an integer.

Then the second half of the summation, namely

summation w equals M over 2 plus 1, to

M, lambda_w, which is upper bounded by the summation, w equals 1 up to M, lambda_w

is less than M over 2, times epsilon.

[BLANK_AUDIO]

Thus we have shown that the second half of

the summation, is less than M over 2, times epsilon.

[BLANK_AUDIO]

From this, we have 1 over M over 2, times

the second half of the summation, is less than epsilon.

As there are exactly M over 2 terms in the second half of

the summation, this means that the average of all the lambda_w's

for w from M over 2 plus 1 to M, is less than epsilon.

[BLANK_AUDIO]

Since we assume that lambda_w are in ascending order.

[BLANK_AUDIO]

The smallest value, in the second half of the summation,

namely lambda_{M over 2 plus 1}, is less than epsilon.

[BLANK_AUDIO]

And this implies that lambda_{M over 2}, which is less than or

equal to lambda_{M over 2 plus 1}, is also less than epsilon.

[BLANK_AUDIO]

The conclusion is that if the average probability of

error, P_e is less than epsilon over 2,

then lambda_1, lambda_2 up to lambda_{M over 2} are all less than epsilon.

Thus, by discarding the worst half of the codewords in the codebook C star, namely,

those codewords that correspond to lambda_{M over 2 plus 1}, up to lambda_M,

we obtain

a new codebook with lambda_max, less than epsilon.

[BLANK_AUDIO]

After discarding the worst half of C star, the rate of the code

becomes 1 over n, times log M over 2, because the size of

the message set, instead of M, is now equal to M over 2.

Now this is equal to 1 over n, times log M, minus 1 over n,