节: 深度强化学习的偏差方差：如何用OpenAI Gym为雅达利构建一个机器人（Bias-Variance for Deep Reinforcement Learning: How To Build a Bot for Atari with OpenAI Gym） | Python机器学习项目

章节大纲

强化学习是控制理论的一个子领域，它关注如何控制随时间变化的系统，并广泛应用于自动驾驶汽车、机器人和游戏机器人等领域。在本指南中，你将使用强化学习来为 Atari 视频游戏构建一个机器人。这个机器人无法访问游戏的内部信息。相反，它只能访问游戏的渲染显示和该显示所对应的奖励，这意味着它只能看到人类玩家所能看到的东西。

在机器学习中，机器人被正式称为智能体 (agent)。在本教程中，智能体是系统中根据决策函数（称为策略 (policy)）行动的“玩家”。主要目标是通过赋予智能体强大的策略来开发出色的智能体。换句话说，我们的目标是通过赋予智能体强大的决策能力来开发智能机器人。

本教程将从训练一个基本的强化学习智能体开始。这个智能体在玩经典 Atari 街机游戏**《太空入侵者》时会采取随机行动，这会作为你的比较基准。之后，你将探索其他几种技术——包括 Q-学习、深度 Q-学习和最小二乘法——同时构建能玩《太空入侵者》和《冰冻湖 (Frozen Lake)》（一个包含在 Gym https://gym.openai.com/ 中的简单游戏环境，Gym 是 OpenAI https://openai.com/ 发布的一个强化学习工具包）的智能体。通过本教程，你将理解在机器学习中选择模型复杂度的基本概念**。

Reinforcement learning is a subfield within control theory, which
concerns controlling systems that change over time and broadly includes
applications such as self-driving cars, robotics, and bots for games.
Throughout this guide, you will use reinforcement learning to build a bot
for Atari video games. This bot is not given access to internal information
about the game. Instead, it’s only given access to the game’s rendered
display and the reward for that display, meaning that it can only see
what a human player would see.
In machine learning, a bot is formally known as an agent. In the case of
this tutorial, an agent is a “player” in the system that acts according to a
decision-making function, called a policy. The primary goal is to develop
strong agents by arming them with strong policies. In other words, our
aim is to develop intelligent bots by arming them with strong decision-
making capabilities.
You will begin this tutorial by training a basic reinforcement learning
agent that takes random actions when playing Space Invaders, the classic
Atari arcade game, which will serve as your baseline for comparison.
Following this, you will explore several other techniques — including Q-
learning, deep Q-learning, and least squares — while building agents
that play Space Invaders and Frozen Lake, a simple game environment
included in Gym (https://gym.openai.com/), a reinforcement learning
toolkit released by OpenAI (https://openai.com/). By following this
tutorial, you will gain an understanding of the fundamental concepts
that govern one’s choice of model complexity in machine learning.
- 选择活动先决条件（Prerequisites）
  
  先决条件（Prerequisites）网页
- 选择活动步骤 1 — 创建项目并安装依赖项
  
  步骤 1 — 创建项目并安装依赖项网页
- 选择活动步骤 2 — 使用 Gym 创建基线随机智能体
  
  步骤 2 — 使用 Gym 创建基线随机智能体网页
- 选择活动步骤 3 — 为《冰冻湖》创建简单的 Q-学习智能体
  
  步骤 3 — 为《冰冻湖》创建简单的 Q-学习智能体网页
- 选择活动步骤 4 — 为《冰冻湖》构建深度 Q-学习智能体
  
  步骤 4 — 为《冰冻湖》构建深度 Q-学习智能体网页
- 选择活动步骤 5 — 为《冰冻湖》构建最小二乘智能体
  
  步骤 5 — 为《冰冻湖》构建最小二乘智能体网页
- 选择活动步骤 6 — 为《太空入侵者》创建深度 Q-学习智能体
  
  步骤 6 — 为《太空入侵者》创建深度 Q-学习智能体网页
- 选择活动结论（Conclusion）
  
  结论（Conclusion）网页