Chapter nine, the Blahut-Arimoto algorithms. [BLANK_AUDIO] We first introduce a notion in information theory called the Single-Letter Characterization. For a discrete memoryless channel p(y|x), the capacity C equals the maximum of the mutual information between input X and output Y of a single use of the channel, over all input distribution r(x), gives the maximum asymptotically achievable rate for reliable communication, as the block length n tends to infinity. This characterization of C in the form of an optimization problem, is called a single-letter characterization because it involves only p(y|x), the transition matrix of one use of the channel, but not the block length n. Similarly, the rate distortion function R(D) for an i.i.d. information source X_k, which is equal to the minimum of the mutual information between X and the random variable X hat, over all transition matrices Q(x hat|x), such that the expected distortion between X and X hat is less than or equal to D, is a single-letter characterization, because it involves only the distribution of the generic source random variable X, but not the block length n. [BLANK_AUDIO] When the alphabets are finite, the capacity C, and the rate distortion function R(D), are given as solutions of finite dimensional optimization problems. However, these quantities cannot be expressed in closed forms except for very special cases. Even computing these quantities is not straightforward because the associated optimization problems are non-linear. So we have to resort to numerical methods. And the Blahut-Arimoto algorithms are iterative algorithms devised for this purpose. In this chapter, we will first discuss a general alternating optimization algorithm. Then we will specialize this algorithm to the Blahut-Arimoto algorithms for computing the channel capacity C and the rate distortion function R(D). And then, we will discuss the convergence of the alternating optimization algorithms. In section 9.1, we consider a general alternating optimization algorithm. [BLANK_AUDIO] Consider a function f of u_1 and u_2, where u_1 and u_2 are multi-dimensional real vectors. And take the supremum of f over all u_1 in some set A_1, and u_2 in some set A_2, where A_i is a convex subset of the n_i-th dimensional Euclidean space for i equals 1 and 2. f, a function from A_1 times A_2 to R is bounded from above, such that f is continuous and has continuous partial derivatives on A_1 times A_2. Also, for all fixed u_2 in A_2, there exists a unique u_1 in A_1 denoted by c_1(u_2), that maximizes the value of f. That is, f of c_1(u_2) and u_2 is equal to the maximum of f of u_1 prime and u_2 over all u_1 prime in A_1. Likewise for all fixed u_1 in A_1 there exists a unique u_2 in A_2 denoted by c_2(u_1) that maximizes the value of f. That is f of u_1 and c_2(u_1) is equal to the maximum of f of u_1 and u_2 prime over all u_2 prime in A_2. Let u be the pair (u_1, u_2) and A be the Cartesian product of A_1 and A_2, then the double supremum can be written as the sup of f of u over all u in A. In other words, the supremum of f is taken over a subset of R to the power n_1 plus n_2 which is equal to the Cartesian product of two convex subsets of R to the power n_1 and R to the power n_2. Denote the value of this supremum by f star. We now describe an alternating optimization algorithm. Let u^k, be the pair (u_1^k,u_2^k) for k greater than or equal to 0, defined as follows. We start with some arbitrarily chosen vector, u_1^0 in A_1, and then let u_2^0 be c_2(u_1^0), that is the u_2 that maximizes f for u_1^0. For k greater than or equal to 1, u^k, is defined by the components u_1^k, and u_2^k, where u_1^k is equal to c_1(u_2^{k minus 1}), that is the unique u_1 that maximizes f for u_2^{k minus 1} and u_2^k is equal to c_2(u_1^k), that is the unique u_2 that maximizes f for u_1^k. Denote f of u^k by f^k. Then evidently, f^k is greater than or equal to f^{k minus 1}. We now give a schematic illustration of the optimization algorithm. We start with f of u_1^0 and u_2^0 and this value is denoted by f^0. First, we update u_1^0 to u_1^1. Then, we update u_2^0 to u_2^1. After two iterations, we have completed one cycle, and we have updated f^0 to f^1. Then we update u_1^1 to u_1^2, and then update u_2^1 to u_2^2. Then, we have completed another cycle and we have updated f^1 to f^2. [BLANK_AUDIO] Since the sequence, f^k is non-decreasing, it must converge because f is bounded from above. We will show that f^k, tends to f star, if f is concave. Replacing f by minus f, the double supremum becomes the double infimum, infimum u_1 in A_1, infimum u_2 in A_2, f of u_1, u_2. The same alternating optimization algorithm can be applied to compute this infimum. In the next section, the alternating optimization algorithm will be specialized for computing the capacity C, and the rate distortion function R(D). [BLANK_AUDIO] Here is a graphical illustration of the alternating optimization algorithm. The x direction represents the u_1 direction, and the y direction represents the u_2 direction, where both of them are multidimensional. The oval shapes here represent the contour lines of a mountain. Starting from some arbitrary point in the mountain, we try to climb to the top of the mountain by moving either east-west or north-south. First, we move east-west and get to the highest possible points in that direction. Then, we move north-south to get the highest point in that direction. Move east-west, north-south, east-west, north-south,so on and so forth. And hopefully, we will approach the highest point of the mountain. As we are going to show, as long as the mountain is concave, we will be able to approach the top of the mountain in this fashion.