章节大纲

  • 分类器 (Classifier)

    分类器是指一个能将未标记实例映射到类别的程序或函数。


    混淆矩阵 (Confusion Matrix)

    混淆矩阵,也称为列联表或误差矩阵,用于可视化分类器的性能。

    矩阵的表示预测类别的实例,而表示实际类别的实例。(注意:这也可以反过来。)

    在二元分类的情况下,该表有 2 行 2 列。

    示例:

    这意味着分类器正确预测了 42 个男性实例,错误地将 8 个男性实例预测为女性。它正确预测了 32 个女性实例。有 18 个实例被错误地预测为男性而非女性。


    准确率 (Accuracy / Error Rate)

    准确率是一个统计度量,定义为分类器做出的正确预测数除以分类器做出的预测总数。

    我们上一个例子中的分类器正确预测了 42 个男性实例和 32 个女性实例。因此,准确率可以计算为:

    准确率 = (42 + 32) / (42 + 8 + 18 + 32) = 0.72

    让我们假设我们有一个分类器,它总是预测“女性”。在这种情况下,我们的准确率为 50%。

    我们将演示所谓的准确率悖论

    一个垃圾邮件识别分类器由以下混淆矩阵描述:

    该分类器的准确率为 (4 + 91) / 100,即 95%。

    以下分类器仅预测“非垃圾邮件”,并且具有相同的准确率。

    这个分类器的准确率是 95%,尽管它完全无法识别任何垃圾邮件。


    精确率 (Precision) 和 召回率 (Recall)

    • 准确率 (Accuracy):

    • 精确率 (Precision):

    • 召回率 (Recall):


    监督学习 (Supervised Learning)

    机器学习程序被赋予输入数据和相应的标签。这意味着学习数据必须事先由人工标记。


    无监督学习 (Unsupervised Learning)

    没有向学习算法提供标签。算法必须自行找出输入数据的聚类。


    强化学习 (Reinforcement Learning)

    计算机程序与其环境动态交互。这意味着程序会收到正向和/或负向反馈以提高其性能。


    CLASSIFIER
    A program or a function which maps from unlabeled instances to classes is called a classifier.
    CONFUSION MATRIX
    A confusion matrix, also called a contingeny table or error matrix, is used to visualize the performance of a
    classifier.
    The columns of the matrix represent the instances of the predicted classes and the rows represent the instances
    of the actual class. (Note: It can be the other way around as well.)
    In the case of binary classification the table has 2 rows and 2 columns.
    Example:
    3
    Confusion
    Matrix
    Predictedmale
    classes
    female
    cl a sA c male
     42
     8
    tsueas
    l
    female
     18
     32
    This means that the classifier correctly predicted a male person in 42 cases and it wrongly predicted 8 male
    instances as female. It correctly predicted 32 instances as female. 18 cases had been wrongly predicted as male
    instead of female.
    ACCURACY (ERROR RATE)
    Accuracy is a statistical measure which is defined as the quotient of correct predictions made by a classifier
    divided by the sum of predictions made by the classifier.
    The classifier in our previous example predicted correctly predicted 42 male instances and 32 female instance.
    Therefore, the accuracy can be calculated by:
    accuracy = (42 + 32) / (42 + 8 + 18 + 32)
    which is 0.72
    Let's assume we have a classifier, which always predicts "female". We have an accuracy of 50 % in this case.
    Confusion
    Matrix
    Predictedmale
    classes
    female
    cl a sA c male
     0
     50
    stueas
    l
    female
     0
     50
    We will demonstrate the so-called accuracy paradox.
    A spam recogition classifier is described by the following confusion matrix:
    4
    Confusion
    Matrix
    Predictedspam
    classes
    ham
    cl a sA c spam
     4
     1
    tsueas
    l
    ham
     4
     91
    The accuracy of this classifier is (4 + 91) / 100, i.e. 95 %.
    The following classifier predicts solely "ham" and has the same accuracy.
    Confusion
    Matrix
    Predictedspam
    classes
    ham
    cl a sA c spam
     0
     5
    tsueas
    l
    ham
     0
     95
    The accuracy of this classifier is 95%, even though it is not capable of recognizing any spam at all.
    PRECISION AND RECALL
    Confusion
    Matrix
    Predictednegative
    classes
    positive
    cl a sA c negative
     TN
     FP
    tsueas
    l
    positive
     FN
     TP
    Accuracy: (TN + TP) / (TN + TP + FN + FP)
    Precision: TP / (TP + FP)
    5
    Recall: TP / (TP + FN)
    SUPERVISED LEARNING
    The machine learning program is both given the input data and the corresponding labelling. This means that
    the learn data has to be labelled by a human being beforehand.
    UNSUPERVISED LEARNING
    No labels are provided to the learning algorithm. The algorithm has to figure out the a clustering of the input
    data.
    REINFORCEMENT LEARNING
    A computer program dynamically interacts with its environment. This means that the program receives
    positive and/or negative feedback to improve it performance.