Section: 机器学习术语（Machine Learning Terminology） | 机器学习Python教程

Section outline

分类器 (Classifier)

分类器是指一个能将未标记实例映射到类别的程序或函数。

混淆矩阵 (Confusion Matrix)

混淆矩阵，也称为列联表或误差矩阵，用于可视化分类器的性能。

矩阵的列表示预测类别的实例，而行表示实际类别的实例。（注意：这也可以反过来。）

在二元分类的情况下，该表有 2 行 2 列。

示例：

这意味着分类器正确预测了 42 个男性实例，错误地将 8 个男性实例预测为女性。它正确预测了 32 个女性实例。有 18 个实例被错误地预测为男性而非女性。

准确率 (Accuracy / Error Rate)

准确率是一个统计度量，定义为分类器做出的正确预测数除以分类器做出的预测总数。

我们上一个例子中的分类器正确预测了 42 个男性实例和 32 个女性实例。因此，准确率可以计算为：

准确率 = (42 + 32) / (42 + 8 + 18 + 32) = 0.72

让我们假设我们有一个分类器，它总是预测“女性”。在这种情况下，我们的准确率为 50%。

我们将演示所谓的准确率悖论。

一个垃圾邮件识别分类器由以下混淆矩阵描述：

该分类器的准确率为 (4 + 91) / 100，即 95%。

以下分类器仅预测“非垃圾邮件”，并且具有相同的准确率。

这个分类器的准确率是 95%，尽管它完全无法识别任何垃圾邮件。

精确率 (Precision) 和召回率 (Recall)

准确率 (Accuracy): $(TN + TP) / (TN + TP + FN + FP)$

精确率 (Precision): $TP / (TP + FP)$

召回率 (Recall): $TP / (TP + FN)$

监督学习 (Supervised Learning)

机器学习程序被赋予输入数据和相应的标签。这意味着学习数据必须事先由人工标记。

无监督学习 (Unsupervised Learning)

没有向学习算法提供标签。算法必须自行找出输入数据的聚类。

强化学习 (Reinforcement Learning)

计算机程序与其环境动态交互。这意味着程序会收到正向和/或负向反馈以提高其性能。

CLASSIFIER

A program or a function which maps from unlabeled instances to classes is called a classifier.

CONFUSION MATRIX

A confusion matrix, also called a contingeny table or error matrix, is used to visualize the performance of a

classifier.

The columns of the matrix represent the instances of the predicted classes and the rows represent the instances

of the actual class. (Note: It can be the other way around as well.)

In the case of binary classification the table has 2 rows and 2 columns.

Example:

3

Confusion

Matrix

Predictedmale

classes

female

cl a sA c male

42

8

tsueas

l

female

18

32

This means that the classifier correctly predicted a male person in 42 cases and it wrongly predicted 8 male

instances as female. It correctly predicted 32 instances as female. 18 cases had been wrongly predicted as male

instead of female.

ACCURACY (ERROR RATE)

Accuracy is a statistical measure which is defined as the quotient of correct predictions made by a classifier

divided by the sum of predictions made by the classifier.

The classifier in our previous example predicted correctly predicted 42 male instances and 32 female instance.

Therefore, the accuracy can be calculated by:

accuracy = (42 + 32) / (42 + 8 + 18 + 32)

which is 0.72

Let's assume we have a classifier, which always predicts "female". We have an accuracy of 50 % in this case.

Confusion

Matrix

Predictedmale

classes

female

cl a sA c male

0

50

stueas

l

female

0

50

We will demonstrate the so-called accuracy paradox.

A spam recogition classifier is described by the following confusion matrix:

4

Confusion

Matrix

Predictedspam

classes

ham

cl a sA c spam

4

1

tsueas

l

ham

4

91

The accuracy of this classifier is (4 + 91) / 100, i.e. 95 %.

The following classifier predicts solely "ham" and has the same accuracy.

Confusion

Matrix

Predictedspam

classes

ham

cl a sA c spam

0

5

tsueas

l

ham

0

95

The accuracy of this classifier is 95%, even though it is not capable of recognizing any spam at all.

PRECISION AND RECALL

Confusion

Matrix

Predictednegative

classes

positive

cl a sA c negative

TN

FP

tsueas

l

positive

FN

TP

Accuracy: (TN + TP) / (TN + TP + FN + FP)

Precision: TP / (TP + FP)

5

Recall: TP / (TP + FN)

SUPERVISED LEARNING

The machine learning program is both given the input data and the corresponding labelling. This means that

the learn data has to be labelled by a human being beforehand.

UNSUPERVISED LEARNING

No labels are provided to the learning algorithm. The algorithm has to figure out the a clustering of the input

data.

REINFORCEMENT LEARNING

A computer program dynamically interacts with its environment. This means that the program receives

positive and/or negative feedback to improve it performance.
- Select activity 评测指标EVALUATION METRICS
  
  评测指标EVALUATION METRICS Page

Section outline

分类器 (Classifier)

混淆矩阵 (Confusion Matrix)

准确率 (Accuracy / Error Rate)

精确率 (Precision) 和 召回率 (Recall)

监督学习 (Supervised Learning)

无监督学习 (Unsupervised Learning)

强化学习 (Reinforcement Learning)

精确率 (Precision) 和召回率 (Recall)