Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.

提供方

## 课程信息

### 您将学到的内容有

Formalize problems as Markov Decision Processes

Understand basic exploration methods and the exploration / exploitation tradeoff

Understand value functions, as a general-purpose tool for optimal decision-making

Know how to implement dynamic programming as an efficient solution approach to an industrial control problem

### 您将获得的技能

#### 100% 在线

#### 第 1 门课程（共 4 门）

#### 可灵活调整截止日期

#### 中级

Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode.

#### 完成时间大约为18 小时

#### 英语（English）

### 提供方

#### 阿尔伯塔大学

UAlberta is considered among the world’s leading public research- and teaching-intensive universities. As one of Canada’s top universities, we’re known for excellence across the humanities, sciences, creative arts, business, engineering and health sciences.

#### Alberta Machine Intelligence Institute

The Alberta Machine Intelligence Institute (Amii) is home to some of the world’s top talent in machine intelligence. We’re an Alberta-based

## 教学大纲 - 您将从这门课程中学到什么

**完成时间为 1 小时**

## Welcome to the Course!

Welcome to: Fundamentals of Reinforcement Learning, the first course in a four-part specialization on Reinforcement Learning brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, get a flavour of what the course has in store for you, and be given an in-depth roadmap to help make your journey through this specialization as smooth as possible.

**完成时间为 1 小时**

**4 个视频**

**2 个阅读材料**

**完成时间为 7 小时**

## The K-Armed Bandit Problem

For the first week of this course, you will learn how to understand the exploration-exploitation trade-off in sequential decision-making, implement incremental algorithms for estimating action-values, and compare the strengths and weaknesses to different algorithms for exploration. For this week’s graded assessment, you will implement and test an epsilon-greedy agent.

**完成时间为 7 小时**

**8 个视频**

**3 个阅读材料**

**1 个练习**

**完成时间为 3 小时**

## Markov Decision Processes

When you’re presented with a problem in industry, the first and most important step is to translate that problem into a Markov Decision Process (MDP). The quality of your solution depends heavily on how well you do this translation. This week, you will learn the definition of MDPs, you will understand goal-directed behavior and how this can be obtained from maximizing scalar rewards, and you will also understand the difference between episodic and continuing tasks. For this week’s graded assessment, you will create three example tasks of your own that fit into the MDP framework.

**完成时间为 3 小时**

**7 个视频**

**2 个阅读材料**

**1 个练习**

**完成时间为 3 小时**

## Value Functions & Bellman Equations

Once the problem is formulated as an MDP, finding the optimal policy is more efficient when using value functions. This week, you will learn the definition of policies and value functions, as well as Bellman equations, which is the key technology that all of our algorithms will use.

**完成时间为 3 小时**

**9 个视频**

**3 个阅读材料**

**2 个练习**

**完成时间为 7 小时**

## Dynamic Programming

This week, you will learn how to compute value functions and optimal policies, assuming you have the MDP model. You will implement dynamic programming to compute value functions and optimal policies and understand the utility of dynamic programming for industrial applications and problems. Further, you will learn about Generalized Policy Iteration as a common template for constructing algorithms that maximize reward. For this week’s graded assessment, you will implement an efficient dynamic programming agent in a simulated industrial control problem.

**完成时间为 7 小时**

**10 个视频**

**3 个阅读材料**

**1 个练习**

### 审阅

#### 4.8

##### 来自FUNDAMENTALS OF REINFORCEMENT LEARNING的热门评论

I understood all the necessary concepts of RL. I've been working on RL for some time now, but thanks to this course, now I have more basic knowledge about RL and can't wait to watch other courses

Concepts are bit hard, but it is nice if you undersand it well, espically the bellman and dynamic programming.\n\nSometimes, visualizing the problem is hard, so need to thoroghly get prepared.

An excellent introduction to the subject of Reinforcement Learning, accompanied by a very clear text book. The python assignments in Jupyter notebooks are both informative and helpful.

One of the best courses I finished on Coursera, I really like the structure of the course. Textbook is also provided which really helps. Looking forward to next course in the series.

Excellent and well done course on some of the basics of RL. Good mix of lectures, reading, quizzes and programming assignments. Also a good balance between pure theory and examples.

Very well designed, it is clear that a lot of thought was put into the course. Also, I really liked the clarity regarding the learning objectives and the emphasis on understanding.

A great introduction to RL. Credit goes to the instructors Mr and Mrs white for keep in it as simple as possible. Understanding the math behind RL is the key for the RL adventure.

The ideal course to go with the book Reinforcement Learning: An Introduction. The quizzes and coding workshops are pitched just right in my opinion, neither too easy nor too hard.

Great course, I think theory is really well explained and book is great, but including more practice exercises is needed for this course to strengthen the learning of concepts.

This is a nice course. I think for people with no background in RL the pace might be a little fast. A few more examples could help them understanding the concepts more easily.

The course was nice, easy to understand and intuitive. The professors taught the concepts very well and I really liked that they included guest lectures by other professors.

Very practical and learning-oriented. Providing the textbook in PDF is a big plus. I think there should be more programming exercises. Great course anyway. Worth taking it.

## 关于 强化学习 专项课程

## 常见问题

我什么时候能够访问课程视频和作业？

注册以便获得证书后，您将有权访问所有视频、测验和编程作业（如果适用）。只有在您的班次开课之后，才可以提交和审阅同学互评作业。如果您选择在不购买的情况下浏览课程，可能无法访问某些作业。

我订阅此专项课程后会得到什么？

您注册课程后，将有权访问专项课程中的所有课程，并且会在完成课程后获得证书。您的电子课程证书将添加到您的成就页中，您可以通过该页打印您的课程证书或将其添加到您的领英档案中。如果您只想阅读和查看课程内容，可以免费旁听课程。

退款政策是如何规定的？

有助学金吗？

还有其他问题吗？请访问 学生帮助中心。