Course three is really a story of something old and something new. Today, we begin our journey and reinforce our learning with function approximation. In a way, this is totally new because we will for the first time discuss objective functions, gradient descent, parameterized functions and generalization discrimination. But, you're totally ready for it. The ideas of exploration exploitation, value functions policies, and many of the elements you've already learned about transfer in a natural way to the function approximation setting. The beginning of course, represents a significant change in perspective. We need to change our focus to learning parameterized functions. We will no longer assume we can store the values for all states in a table. In fact, we can't even guarantee we will see the same state more than once. Instead, we'll learn functions parameterized by set of weights, like a neural network to approximate the values. We won't be able to get this approximation perfect. That means in some states, we'll have to be okay with the value function being inaccurate. Despite this, the move to parameterize value functions is actually overall positive because of generalization. In fact, this better reflects how we learn. You generalize your predictions about how much work you'll have to do in this class based on experience taking other classes. The concepts will become a bit more technical. We will reason about the distribution of states our agents encounter and how that impacts the accuracy of the value function. We'll compute the gradient of objective functions and derive new learning rules. At this point of specialization, you've learned about many things. Now we're going to learn about a bunch of new things. It's easy to get lost. Worse, you might lose sight of which algorithms are most relevant for the given task at hand. To help with this, we've developed a course map. The idea is that you should be able to start at the top of the tree and work your way down to the algorithm that best fits your problem. Remember, this is a course map. It does not include all the algorithms in RL, just the ones we have talked about. The map is designed to summarize the algorithms in this course. It is not necessarily the best way to categorize the broader set of algorithms in RL. So use this as a mental model of the course itself. Let's remind ourselves what we've covered in course one and two using the map. We started by assuming the agent could perfectly represent the values in a table. We did this to focus on the fundamental concepts of reinforced learning without getting bogged down by approximation. The first RL methods we discussed, use a model of the world that was given not learned. We use dynamic programming methods to compute value functions in iterative policies from the model without ever interacting with the world. The branches of this map show all three DP algorithms that we discussed. In the next course, we covered sample-based methods. We first talked about Monte Carlo methods which must wait until the end of an episode to make updates. We'll learn about two Monte Carlo methods for prediction and two for control. After that, we introduce you to temporal difference learning. This family of algorithms allows the agent to make updates to the value function and policy on each step of the episode. Here, we learned about some of the most widely used algorithms from reinforcement learning including Q-Learning, SARSA and expected SARSA. We finished off course two by looping backup to model these planning methods. We study the Dyna architecture in which the agent learns a model while interacting with the world. For course three, we only need the right side of this map. We will first introduce parameters functions and how we can use them to approximate value functions. We will talk about generalization discrimination as well as particular function approximators, including course coding and neural networks. As before, we'll start with prediction and derive new Monte Carlo and TER algorithms using ideas from supervised learning and gradient descent. Then we'll talk about control algorithms for function approximation. This includes expected SARSA and Q-Learning. After that, we will totally blow your mind with a new way of formulating continuing control problems called average reward. We'll finish off course three with parameterized policies. In a nutshell, we can parameterize policies just like we parameterize the value function. We will discuss how to do this in the average reward setting. By the end of this course, you will know much of what you need to scale RL to real problems. You'll then get to use all this newfound understanding to implement a complete RL system in course four. So let's get to it.