# 强化学习 专项课程

掌握强化学习的概念. Implement a complete RL solution and understand how to apply AI tools to solve real-world problems.

提供方

### 您将学到的内容有

Build a Reinforcement Learning system for sequential decision making.

Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).

Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.

Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning

### 您将获得的技能

## 关于此 专项课程

## 应用的学习项目

Through programming assignments and quizzes, students will:

Build a Reinforcement Learning system that knows how to make automated decisions.

Understand how RL relates to and fits under the broader umbrella of machine learning, deep learning, supervised and unsupervised learning.

Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradient, Dyna, and more).

Understand how to formalize your task as a RL problem, and how to begin implementing a solution.

Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode

Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode

### 此专项课程包含 4 门课程

### Fundamentals of Reinforcement Learning

Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.

### Sample-based Learning Methods

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

### Prediction and Control with Function Approximation

In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that estimating value functions can be cast as a supervised learning problem---function approximation---allowing you to build agents that carefully balance generalization and discrimination in order to maximize reward. We will begin this journey by investigating how our policy evaluation or prediction methods like Monte Carlo and TD can be extended to the function approximation setting. You will learn about feature construction techniques for RL, and representation learning via neural networks and backprop. We conclude this course with a deep-dive into policy gradient methods; a way to learn policies directly without learning a value function. In this course you will solve two continuous-state control tasks and investigate the benefits of policy gradient methods in a continuous-action environment.

### A Complete Reinforcement Learning System (Capstone)

In this final course, you will put together your knowledge from Courses 1, 2 and 3 to implement a complete RL solution to a problem. This capstone will let you see how each component---problem formulation, algorithm selection, parameter selection and representation design---fits together into a complete solution, and how to make appropriate choices when deploying RL in the real world. This project will require you to implement both the environment to stimulate your problem, and a control agent with Neural Network function approximation. In addition, you will conduct a scientific study of your learning system to develop your ability to assess the robustness of RL agents. To use RL in the real world, it is critical to (a) appropriately formalize the problem as an MDP, (b) select appropriate algorithms, (c ) identify what choices in your implementation will have large impacts on performance and (d) validate the expected behaviour of your algorithms. This capstone is valuable for anyone who is planning on using RL to solve real problems.

### 提供方

#### 阿尔伯塔大学

UAlberta is considered among the world’s leading public research- and teaching-intensive universities. As one of Canada’s top universities, we’re known for excellence across the humanities, sciences, creative arts, business, engineering and health sciences.

#### Alberta Machine Intelligence Institute

The Alberta Machine Intelligence Institute (Amii) is home to some of the world’s top talent in machine intelligence. We’re an Alberta-based

## 常见问题

完成专项课程后我会获得大学学分吗？

此专项课程不提供大学学分，但部分大学可能会选择接受专项课程证书作为学分。查看您的合作院校，了解详情。Coursera 上的在线学位和 Mastertrack™ 证书提供获得大学学分的机会。

Can I just enroll in a single course?

如果订阅，您可以获得 7 天免费试听，在此期间，您可以取消课程，无需支付任何罚金。在此之后，我们不会退款，但您可以随时取消订阅。请阅读我们完整的退款政策。

我可以只注册一门课程吗？

可以！点击您感兴趣的课程卡开始注册即可。注册并完成课程后，您可以获得可共享的证书，或者您也可以旁听该课程免费查看课程资料。如果您订阅的课程是某专项课程的一部分，系统会自动为您订阅完整的专项课程。访问您的学生面板，跟踪您的进度。

Can I take the course for free?

是的，Coursera 可以为无法承担费用的学生提供助学金。通过点击左侧“注册”按钮下的“助学金”链接可以申请助学金。您可以根据屏幕提示完成申请，申请获批后会收到通知。您需要针对专项课程中的每一门课程完成上述步骤，包括毕业项目。了解更多。

我可以免费学习课程吗？

完成注册课程后，您可以学习专项课程中的所有课程，并且完成作业后可以获得证书。如果您只想阅读和查看课程内容，可以免费旁听该课程。如果您无法承担课程费用，可以申请助学金。

此课程是 100% 在线学习吗？是否需要现场参加课程？

此课程完全在线学习，无需到教室现场上课。您可以通过网络或移动设备随时随地访问课程视频、阅读材料和作业。

完成专项课程需要多长时间？

Recommended that learners have at least one year of undergraduate computer science or 2-3 years of professional experience in software development. Experience and comfort with programming in Python required. Must be comfortable converting algorithms and pseudocode into Python. Basic understanding of concepts from statistics (distributions, sampling, expected values), linear algebra (vectors and matrices), and calculus (computing derivatives)

Do I need to take the courses in a specific order?

Yes, it is recommended that courses are taken sequentially.

Will I earn university credit for completing the Specialization?

Learners that complete the specialization will earn a Coursera specialization certificate signed by the professors of record, not a University of Alberta credit.

完成专项课程后我会获得大学学分吗？

By the end of this specialization, you will be able to"

- Build a Reinforcement Learning system for sequential decision making.
- Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).
- Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.
- Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning

还有其他问题吗？请访问 学生帮助中心。