Python机器学习项目
在机器学习中,任务通常被划分为几个大类。这些分类是基于学习是如何接收的,或者如何向所开发的系统提供学习反馈来决定的。
最广泛采用的两种机器学习方法是:
- 监督学习 (Supervised Learning):基于人类标注的示例输入和输出数据来训练算法。
- 无监督学习 (Unsupervised Learning):不向算法提供任何标注数据,让它自己在其输入数据中寻找结构。
下面我们来详细探讨这些方法。
监督学习
在监督学习中,计算机被提供带有预期输出标签的示例输入。这种方法的目的是让算法通过将其实际输出与“教授”的输出进行比较来发现错误,并相应地修改模型,从而实现“学习”。因此,监督学习利用模式来预测额外未标记数据上的标签值。
例如,在监督学习中,算法可能会被输入一些数据,其中鲨鱼的图片被标记为“鱼”,海洋的图片被标记为“水”。通过对这些数据进行训练,监督学习算法应该能够随后将未标记的鲨鱼图片识别为“鱼”,并将未标记的海洋图片识别为“水”。
监督学习的一个常见用例是使用历史数据来预测统计上可能发生的未来事件。它可以使用历史股市信息来预测即将到来的波动,或者用于过滤垃圾邮件。在监督学习中,已标记的狗的照片可以作为输入数据来对未标记的狗的照片进行分类。
无监督学习
在无监督学习中,数据是未标记的,因此学习算法被留下来在其输入数据中寻找共同点。由于未标记数据比已标记数据更丰富,促进无监督学习的机器学习方法特别有价值。
无监督学习的目标可能像发现数据集中隐藏的模式一样简单,但也可能有一个特征学习的目标,它允许计算机器自动发现分类原始数据所需的表示。
无监督学习常用于事务性数据。你可能有一个庞大的客户及其购买记录的数据集,但作为人类,你可能无法理解从客户档案及其购买类型中可以提取出哪些相似的属性。将这些数据输入到无监督学习算法中,可能会确定某个年龄段购买无香皂的女性很可能是孕妇,因此可以将与怀孕和婴儿产品相关的营销活动定向到这些受众,以增加她们的购买量。
在没有被告知“正确”答案的情况下,无监督学习方法可以查看更广泛且看似不相关复杂数据,以便以潜在有意义的方式组织它们。无监督学习常用于异常检测,包括欺诈性信用卡购买,以及推荐接下来购买哪些产品的推荐系统。在无监督学习中,未标记的狗的照片可以作为算法的输入数据,以寻找相似之处并将狗的照片分类在一起。
Machine Learning Methods
In machine learning, tasks are generally classified into broad categories.
These categories are based on how learning is received or how feedback
on the learning is given to the system developed.
Two of the most widely adopted machine learning methods are
supervised learning which trains algorithms based on example input and
output data that is labeled by humans, and unsupervised learning which
provides the algorithm with no labeled data in order to allow it to find
structure within its input data. Let’s explore these methods in more
detail.
Supervised Learning
In supervised learning, the computer is provided with example inputs
that are labeled with their desired outputs. The purpose of this method is
for the algorithm to be able to “learn” by comparing its actual output
with the “taught” outputs to find errors, and modify the model
accordingly. Supervised learning therefore uses patterns to predict label
values on additional unlabeled data.
For example, with supervised learning, an algorithm may be fed data
with images of sharks labeled as fish and images of oceans labeled as
water. By being trained on this data, the supervised learning algorithm
should be able to later identify unlabeled shark images as fish and
unlabeled ocean images as water.
A common use case of supervised learning is to use historical data to
predict statistically likely future events. It may use historical stock
market information to anticipate upcoming fluctuations, or be employed
to filter out spam emails. In supervised learning, tagged photos of dogs
can be used as input data to classify untagged photos of dogs.
Unsupervised Learning
In unsupervised learning, data is unlabeled, so the learning algorithm is
left to find commonalities among its input data. As unlabeled data are
more abundant than labeled data, machine learning methods that
facilitate unsupervised learning are particularly valuable.
The goal of unsupervised learning may be as straightforward as
discovering hidden patterns within a dataset, but it may also have a goal
of feature learning, which allows the computational machine to
automatically discover the representations that are needed to classify raw
data.
Unsupervised learning is commonly used for transactional data. You
may have a large dataset of customers and their purchases, but as a
human you will likely not be able to make sense of what similar
attributes can be drawn from customer profiles and their types of
purchases. With this data fed into an unsupervised learning algorithm, it
may be determined that women of a certain age range who buy
unscented soaps are likely to be pregnant, and therefore a marketing
campaign related to pregnancy and baby products can be targeted to this
audience in order to increase their number of purchases.
Without being told a “correct” answer, unsupervised learning methods
can look at complex data that is more expansive and seemingly unrelated
in order to organize it in potentially meaningful ways. Unsupervised
learning is often used for anomaly detection including for fraudulent
credit card purchases, and recommender systems that recommend what
products to buy next. In unsupervised learning, untagged photos of dogs
can be used as input data for the algorithm to find likenesses and classify
dog photos together.