Many robotic control applications, including the lunar lander application that you work on in the practice lab, have continuous state spaces. Let's take a look at what that means and how to generalize the concept we've talked about to these continuous state spaces. A simplify Mars rover example we use, I use a discrete set of states, and what that means is that simplify Mars rover could only be in one of six possible positions. But most robots can be in more than one of six or any discrete number of positions, instead, they can be in any of a very large number of continuous value positions. For example, if the Mars rover could be anywhere on a line, so its position was indicated by a number ranging from 0-6 kilometers where any number in between is valid. That would be an example of a continuous state space, because the position would be represented by a number such as that is 2.7 kilometers along or 4.8 kilometers or any other number between zero and six. Let's look at another example. I'm going to use for this example, the application of controlling a car or a truck. Here's a toy car, Russia toy truck. This one belongs to my daughter. If you're building a self-driving car or self-driving truck and you want to control this to drive smoothly, then the state of this truck might include a few numbers such as this, x position, its y position, maybe it's orientation. What way is it facing? Assuming the truck stays on the ground, you probably don't need to worry about how tall is, how high up it is. This is state will include x, y, and is angle Theta, as well as maybe its speeds in x-direction, the speed in the y-direction, and how quickly it's turning. Is it turning at one degree per second or is it turning at 30 degrees per second or is it turning really quickly at 90 degrees per second? For a truck or a car, the state might include not just one number, like how many kilometers of this along on this line, but they might includes six numbers, is x position, is y position, is orientation, which I'm going to denote using Greek alphabet Theta, as well its velocity in the x-direction, which I will denote using x dot, so that means how quickly is this x-coordinate changing, y dot how quickly is the y coordinate changing, and then finally, Theta dot, which is how quickly is the angle of the car changing. Whereas for the 60 Mars rover example, the state was just one of six possible numbers. It could be one, two, three, four, five or six. For the car, the state would comprise this vector of six numbers, and any of these numbers can take on any value within is valid range. For example, Theta should range between zero and 360 degrees. Let's look at another example. What if you're building a reinforcement learning algorithm to control an autonomous helicopter, how would you characterize the position of a helicopter? To illustrate, I have with me here a small toy helicopter. The positioning of the helicopter would include is x position, such as how far north or south is a helicopter, is y position. Maybe how far on the east-west axis is the helicopter, and then also z, the height of the helicopter above ground. But other than the position, the helicopter also has an orientation, and conventionally, one way to capture its orientation is with three additional numbers, one of which captures of the row of the helicopter. Is it rolling to the left or the right? The pitch, is it pitching forward or pitching up, pitching back, and then finally the yaw which is west the compass orientation is it facing. If facing north or east or south or west? To summarize, the state of the helicopter includes is position in the say, north-south direction, is positioned in the east-west direction, y is height above ground, and also the row, the pitch, and also that yaw of helicopter. To write this down, the state therefore includes the position x, y, z, and then the row pitch, and yaw denoted with the Greek alphabets Phi, Theta and Omega. But to control the helicopter, we also need to know its speed in the x-direction, in the y-direction, and in the z direction, as well as its rate of turning, also called the angular velocity. How fast is this row changing and how fast is this pitch changing and how fast is its yaw changing? This is actually the state used to control autonomous helicopters. Is this list of 12 numbers that is input to a policy, and the job of a policy is look at these 12 numbers and decide what's an appropriate action to take in the helicopter. So any continuous state reinforcement learning problem or a continuous state Markov decision process, continuously MTP. The state of the problem isn't just one of a small number of possible discrete values, like a number from 1-6. Instead, it's a vector of numbers, any of which could take any of a large number of values. In the practice lab for this week, you get to implement for yourself a reinforcement learning algorithm applied to a simulated lunar lander application. Landing something on the moon is simulation. Let's take a look in the next video at what that application entails, since there will be another continuous state application.