利用真实的类别标签数组,我们可以通过比较这两个数组(test_labels vs. preds)来评估模型预测值的准确性。我们将使用 Scikit-learn 的 accuracy_score() 函数来确定机器学习分类器的准确性。

ML Tutorial

Python
...
from sklearn.metrics import accuracy_score

# Evaluate accuracy
print(accuracy_score(test_labels, preds))

你将看到以下结果:


Jupyter Notebook,Python 单元格打印 NB 分类器的准确性

正如你在输出中看到的,NB 分类器的准确率为 94.15%。这意味着分类器有 94.15% 的时间能够正确预测肿瘤是恶性还是良性。这些结果表明我们包含 30 个属性的特征集是肿瘤类别的良好指标。

你已经成功构建了你的第一个机器学习分类器。让我们通过将所有导入语句放在 Notebook 或脚本的顶部来重新组织代码。最终版本的代码应该如下所示:

ML Tutorial

Python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

# Look at our data
print(label_names)
print('Class label = ', labels[0])
print(feature_names)
print(features[0])

# Split our data
train, test, train_labels, test_labels = train_test_split(features,
                                                            labels,
                                                            test_size=0.33,
                                                            random_state=42)

# Initialize our classifier
gnb = GaussianNB()

# Train our classifier
model = gnb.fit(train, train_labels)

# Make predictions
preds = gnb.predict(test)
print(preds)

# Evaluate accuracy
print(accuracy_score(test_labels, preds))

现在你可以继续处理你的代码,看看能否让你的分类器表现得更好。你可以尝试不同的特征子集,甚至尝试完全不同的算法。访问 Scikit-learn 网站 scikit-learn.org/stable 获取更多机器学习的灵感。


你对尝试用其他算法来提高模型的准确性感兴趣吗?


Step 5 — Evaluating the Model’s Accuracy
Using the array of true class labels, we can evaluate the accuracy of our
model’s predicted values by comparing the two arrays (test_labels
vs. preds). We will use the sklearn function accuracy_score() to
determine the accuracy of our machine learning classifier.
ML Tutorial
...
from sklearn.metrics import accuracy_score
# Evaluate accuracy
print(accuracy_score(test_labels, preds))
You’ll see the following results:
Alt Jupyter Notebook with Python cell that prints the accuracy of our NB classifier
As you see in the output, the NB classifier is 94.15% accurate. This
means that 94.15 percent of the time the classifier is able to make the
correct prediction as to whether or not the tumor is malignant or benign.
These results suggest that our feature set of 30 attributes are good
indicators of tumor class.
You have successfully built your first machine learning classifier. Let’s
reorganize the code by placing all import statements at the top of the
Notebook or script. The final version of the code should look like this:
ML Tutorial
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load dataset
data = load_breast_cancer()
# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
# Look at our data
print(label_names)
print('Class label = ', labels[0])
print(feature_names)
print(features[0])
# Split our data
train, test, train_labels, test_labels = train_test_split(features,
labels,
test_size=0.33,
random_state=42)
# Initialize our classifier
gnb = GaussianNB()
# Train our classifier
model = gnb.fit(train, train_labels)
# Make predictions
preds = gnb.predict(test)
print(preds)
# Evaluate accuracy
print(accuracy_score(test_labels, preds))
Now you can continue to work with your code to see if you can make
your classifier perform even better. You could experiment with different
subsets of features or even try completely different algorithms. Check out
Scikit-learn’s website at scikit-learn.org/stable for more machine learning
ideas.

最后修改: 2025年06月25日 星期三 11:46