In this video, we will explore the flexibility of the MDP formalism with a few examples. By the end of this video, you will gain experience formalizing decision-making problems as MDPs, and appreciate the flexibility of the MDP formalism. Consider recycling robot which collects empty soda cans in an office environment. It can detect soda cans, pick them up using his gripper, and dropped them off in a recycling bin. The robot runs in a rechargeable battery. Its objective is to collect as many cans as possible. Let's formulate this problem as an MDP. We will start with the states, actions, and rewards. Let's assume that the sensors can only distinguish two charged levels, low and high. These charged levels represent the robot's state. In each state, the robot has three choices. It can search for cans for a fixed amount of time, it can remain stationary and wait for someone to bring in a can, or it can go to the charging station to recharge its battery. We only allow recharging from the low state because recharging is pointless when the energy level is high. Now, let's consider the transition dynamics. First, let's try the states using open circles. Searching for cans when the energy level is high might reduce the energy level to low. That is the search action in the state high might not change the state. Let's say with probability Alpha, or the energy level might drop to low with probability one minus Alpha. In both cases, the robots search yields a reward of r_search. For instance, r_search could be plus 10 indicating that the robot found 10 cans. The robot can also wait. Waiting for cans does not drain the battery, so the state does not change. In both cases, the wait action yields a reward of r_wait. For example, r_wait could be plus one. Searching when the energy level is low might deplete the battery, then the robot would need to be rescued. Let's write this probability as one minus Beta. If the robot is rescued then its battery is restored. However, needing rescue yields a negative reward of r_rescued. For example, r_rescued could be minus 20 because we were annoyed with the robot. Alternatively, the battery might not run out. This occurs with probability beta and the robot receives a reward of r_search. Taking the recharge action restores the battery to the level high and receives a reward at zero. That's it. We have completely specified the MDP for the recycling robot problem. We have discussed one example where an MDP is used to precisely specify a problem. But you might wonder, how general is this framework? The MDP formalism can be used in many different applications, in many different ways. States can be low-level sensory readings, for example, in the pixel values of the video frame. They can also be high-level such as object descriptions. Similarly, actions can be low-level, such as the wheel speed of this robot. Actions can also be high-level, such as go to the charging station. Time-steps can be very small or very large. For example, they can be one millisecond or one month. Let's look at one more application. Suppose we want to use reinforcement learning to control a robot arm in a pick-and-place task? The goal of the robot is to pick up objects and place them in a particular location. There are many ways we can formalize this task. Here's one possibility. The state could be the readings of the joint angles and velocities. The actions could be the voltages applied to each motor. The reward could be plus 100 for successfully placing each object. But we also want the robot to use as little energy as possible. So let's include a small negative reward corresponding to the energy used. That was not so hard. To recap, the MDP framework can be used to formalize a wide variety of sequential decision-making problems.