Python机器学习项目
章节大纲
-
随着机器学习越来越多地被用于发现模式、进行分析和做出决策,且无需人工的最终干预,同样重要的是,我们不仅要提供资源来推进算法和方法,还要投入精力让更多利益相关者参与进来。这本关于机器学习 Python 项目的书籍正是旨在实现这一点:为当今和未来的开发者配备工具,使他们能够更好地理解、评估和塑造机器学习,从而确保它能服务于我们所有人。
如果你还没有 Python 编程环境,本书将帮助你搭建一个,然后通过“机器学习简介”一章为你提供机器学习的概念性理解。接下来是三个 Python 机器学习项目。它们将帮助你创建一个机器学习分类器、构建一个神经网络来识别手写数字,并通过为 Atari 构建一个机器人来为你提供深度强化学习的背景知识。
这些章节最初以文章的形式发表在 DigitalOcean 社区上,由国际软件开发者社区的成员撰写。如果你有兴趣为这个知识库做出贡献,可以考虑向 do.co/w4do 上的 Write for DOnations 项目投稿。DigitalOcean 会向作者支付稿酬,并向专注于科技的非营利组织提供匹配捐款。
本系列其他书籍
如果你正在学习 Python 或正在寻找参考资料,可以下载我们免费的 Python 电子书 《How To Code in Python 3》,该书可通过 do.co/python-book 获取。
对于其他编程语言和 DevOps 工程文章,我们的知识库包含 2,100 多篇教程,可作为 知识共享许可资源 通过 do.co/tutorials 获取。
-
机器学习是人工智能 (AI) 的一个子领域。机器学习的目标通常是理解数据的结构并将这些数据拟合到可以被人理解和利用的模型中。
尽管机器学习是计算机科学领域的一个分支,但它与传统的计算方法有所不同。在传统计算中,算法是计算机用于计算或解决问题的一系列明确编程的指令。而机器学习算法则允许计算机在数据输入上进行训练,并使用统计分析来输出落在特定范围内的值。正因为如此,机器学习促进计算机根据样本数据构建模型,从而基于数据输入实现决策过程的自动化。
今天,任何技术用户都受益于机器学习。例如,面部识别技术让社交媒体平台能够帮助用户标记和分享朋友的照片;光学字符识别 (OCR) 技术能将文本图像转换为可编辑的文本;由机器学习驱动的推荐引擎会根据用户偏好推荐接下来观看的电影或电视节目;而依赖机器学习进行导航的自动驾驶汽车也可能很快就能面向消费者。
机器学习是一个持续发展的领域。因此,在您使用机器学习方法或分析机器学习过程的影响时,需要牢记一些注意事项。
在本教程中,我们将深入探讨机器学习中常见的监督学习和无监督学习方法,以及常见的算法方法,包括 K 近邻算法、决策树学习和深度学习。我们还将探讨机器学习中最常用的编程语言,并为您提供每种语言的优缺点。此外,我们还将讨论机器学习算法中存在的偏见,并思考在构建算法时如何避免这些偏见。
Machine learning is a subfield of artificial intelligence (AI). The goal of
machine learning generally is to understand the structure of data and fit
that data into models that can be understood and utilized by people.
Although machine learning is a field within computer science, it differs
from traditional computational approaches. In traditional computing,
algorithms are sets of explicitly programmed instructions used by
computers to calculate or problem solve. Machine learning algorithms
instead allow for computers to train on data inputs and use statistical
analysis in order to output values that fall within a specific range.
Because of this, machine learning facilitates computers in building
models from sample data in order to automate decision-making
processes based on data inputs.
Any technology user today has benefitted from machine learning.
Facial recognition technology allows social media platforms to help users
t a g and share photos of friends. Optical character recognition (OCR)
technology converts images of text into movable type. Recommendation
engines, powered by machine learning, suggest what movies or
television shows to watch next based on user preferences. Self-driving
cars that rely on machine learning to navigate may soon be available to
consumers.
Machine learning is a continuously developing field. Because of this,
there are some considerations to keep in mind as you work with machine
learning methodologies, or analyze the impact of machine learning
processes.
In this tutorial, we’ll look into the common machine learning methods
o f supervised and unsupervised learning, and common algorithmic
approaches in machine learning, including the k-nearest neighbor
algorithm, decision tree learning, and deep learning. We’ll explore which
programming languages are most used in machine learning, providing
y o u with some of the positive and negative attributes of each.
Additionally, we’ll discuss biases that are perpetuated by machine
learning algorithms, and consider what can be kept in mind to prevent
these biases when building algorithms. -
如何用Scikit- learn在Python中构建一个机器学习分类器( How To Build a Machine Learning Classifier in Python with Scikit- learn)
在本教程中,你将使用 Python 的机器学习工具Scikit-learn来实现一个简单的机器学习算法。我们将利用一个乳腺癌肿瘤信息数据库,使用朴素贝叶斯(NB)分类器来预测肿瘤是恶性还是良性。
学完本教程,你将能够用 Python 构建你自己的机器学习模型。
In this tutorial, you’ll implement a simple machine learning algorithm in
Python using Scikit-learn, a machine learning tool for Python. Using a
database of breast cancer tumor information, you’ll use a Naive Bayes
(NB) classifier that predicts whether or not a tumor is malignant or
benign.
By the end of this tutorial, you’ll know how to build your very own
machine learning model in Python. -
如何用TensorFlow构建神经网络识别手写数字(How To Build a Neural Network to Recognize Handwritten Digits with TensorFlow)
神经网络作为深度学习的一种方法,是人工智能的众多子领域之一。它们最初在大约70年前被提出,旨在模拟人脑的工作方式,尽管是以一种更为简化的形式。独立的“神经元”以层级连接,并分配有权重,以决定当信号在网络中传播时神经元如何响应。过去,神经网络在能够模拟的神经元数量上受到限制,因此它们所能实现的学习复杂性也有限。但近年来,由于硬件开发的进步,我们已经能够构建非常深的神经网络,并用巨大的数据集对其进行训练,从而在机器智能方面取得了突破。
这些突破使得机器在执行某些任务时能够匹配甚至超越人类的能力。其中一项任务就是物体识别。尽管机器在历史上一直无法与人类视觉相媲美,但深度学习的最新进展使得构建能够识别物体、人脸、文本甚至情感的神经网络成为可能。
在本教程中,你将实现物体识别的一个小分支——数字识别。你将使用 TensorFlow(https://www.tensorflow.org/),这是一个由 Google Brain 实验室为深度学习研究开发的开源 Python 库,处理手绘的 0-9 数字图像,并构建和训练一个神经网络来识别并预测所显示数字的正确标签。
虽然你不需要具备深度学习实践或 TensorFlow 的先验经验来跟随本教程,但我们假设你对机器学习术语和概念(如训练和测试、特征和标签、优化和评估)有所了解。
Neural networks are used as a method of deep learning, one of the many
subfields of artificial intelligence. They were first proposed around 70
years ago as an attempt at simulating the way the human brain works,
though in a much more simplified form. Individual ‘neurons’ are
connected in layers, with weights assigned to determine how the neuron
responds when signals are propagated through the network. Previously,
neural networks were limited in the number of neurons they were able to
simulate, and therefore the complexity of learning they could achieve.
But in recent years, due to advancements in hardware development, we
have been able to build very deep networks, and train them on enormous
datasets to achieve breakthroughs in machine intelligence.
These breakthroughs have allowed machines to match and exceed the
capabilities of humans at performing certain tasks. One such task is
object recognition. Though machines have historically been unable to
match human vision, recent advances in deep learning have made it
possible to build neural networks which can recognize objects, faces, text,
and even emotions.
In this tutorial, you will implement a small subsection of object
recognition—digit
recognition.
Using
TensorFlow
(https://www.tensorflow.org/),
an
open-source
Python
library
developed by the Google Brain labs for deep learning research, you will
take hand-drawn images of the numbers 0-9 and build and train a neural
network to recognize and predict the correct label for the digit displayed.
While you won’t need prior experience in practical deep learning or
TensorFlow to follow along with this tutorial, we’ll assume some
familiarity with machine learning terms and concepts such as training
and testing, features and labels, optimization, and evaluation. -
深度强化学习的偏差方差:如何用OpenAI Gym为雅达利构建一个机器人(Bias-Variance for Deep Reinforcement Learning: How To Build a Bot for Atari with OpenAI Gym)
强化学习是控制理论的一个子领域,它关注如何控制随时间变化的系统,并广泛应用于自动驾驶汽车、机器人和游戏机器人等领域。在本指南中,你将使用强化学习来为 Atari 视频游戏构建一个机器人。这个机器人无法访问游戏的内部信息。相反,它只能访问游戏的渲染显示和该显示所对应的奖励,这意味着它只能看到人类玩家所能看到的东西。
在机器学习中,机器人被正式称为智能体 (agent)。在本教程中,智能体是系统中根据决策函数(称为策略 (policy))行动的“玩家”。主要目标是通过赋予智能体强大的策略来开发出色的智能体。换句话说,我们的目标是通过赋予智能体强大的决策能力来开发智能机器人。
本教程将从训练一个基本的强化学习智能体开始。这个智能体在玩经典 Atari 街机游戏**《太空入侵者》时会采取随机行动,这会作为你的比较基准。之后,你将探索其他几种技术——包括 Q-学习、深度 Q-学习和最小二乘法——同时构建能玩《太空入侵者》和《冰冻湖 (Frozen Lake)》(一个包含在 Gym https://gym.openai.com/ 中的简单游戏环境,Gym 是 OpenAI https://openai.com/ 发布的一个强化学习工具包)的智能体。通过本教程,你将理解在机器学习中选择模型复杂度的基本概念**。
Reinforcement learning is a subfield within control theory, which
concerns controlling systems that change over time and broadly includes
applications such as self-driving cars, robotics, and bots for games.
Throughout this guide, you will use reinforcement learning to build a bot
for Atari video games. This bot is not given access to internal information
about the game. Instead, it’s only given access to the game’s rendered
display and the reward for that display, meaning that it can only see
what a human player would see.
In machine learning, a bot is formally known as an agent. In the case of
this tutorial, an agent is a “player” in the system that acts according to a
decision-making function, called a policy. The primary goal is to develop
strong agents by arming them with strong policies. In other words, our
aim is to develop intelligent bots by arming them with strong decision-
making capabilities.
You will begin this tutorial by training a basic reinforcement learning
agent that takes random actions when playing Space Invaders, the classic
Atari arcade game, which will serve as your baseline for comparison.
Following this, you will explore several other techniques — including Q-
learning, deep Q-learning, and least squares — while building agents
that play Space Invaders and Frozen Lake, a simple game environment
included in Gym (https://gym.openai.com/), a reinforcement learning
toolkit released by OpenAI (https://openai.com/). By following this
tutorial, you will gain an understanding of the fundamental concepts
that govern one’s choice of model complexity in machine learning.