机器学习模型种类繁多,每种模型都有其优缺点。在本教程中,我们将重点介绍一种在二元分类任务中通常表现良好的简单算法,即朴素贝叶斯 (Naive Bayes, NB)

首先,导入 GaussianNB 模块。然后使用 GaussianNB() 函数初始化模型,接着使用 gnb.fit() 将模型与数据拟合来训练模型:

ML Tutorial

Python
...
from sklearn.naive_bayes import GaussianNB

# Initialize our classifier
gnb = GaussianNB()

# Train our classifier
model = gnb.fit(train, train_labels)

训练模型后,我们就可以使用训练好的模型对测试集进行预测了,这通过 predict() 函数来实现。predict() 函数返回测试集中每个数据实例的预测数组。然后我们可以打印出我们的预测结果,以便了解模型是如何判断的。

predict() 函数与测试集一起使用并打印结果:

ML Tutorial

Python
...
# Make predictions
preds = gnb.predict(test)
print(preds)

运行代码,你将看到以下结果:


Jupyter Notebook,Python 单元格打印朴素贝叶斯分类器在测试数据上的预测值

正如你在 Jupyter Notebook 输出中看到的,predict() 函数返回了一个由 0 和 1 组成的数组,它们代表了我们对肿瘤类别(恶性 vs. 良性)的预测值。

现在我们有了预测结果,接下来让我们评估一下分类器的表现如何。


Step 4 — Building and Evaluating the Model
There are many models for machine learning, and each model has its
own strengths and weaknesses. In this tutorial, we will focus on a simple
algorithm that usually performs well in binary classification tasks,
namely Naive Bayes (NB).
First, import the GaussianNB module. Then initialize the model with
the GaussianNB() function, then train the model by fitting it to the data
using gnb.fit():
ML Tutorial
...
from sklearn.naive_bayes import GaussianNB
# Initialize our classifier
gnb = GaussianNB()
# Train our classifier
model = gnb.fit(train, train_labels)
After we train the model, we can then use the trained model to make
predictions on our test set, which we do using the predict() function.
The predict() function returns an array of predictions for each data
instance in the test set. We can then print our predictions to get a sense of
what the model determined.
Use the predict() function with the test set and print the results:
ML Tutorial
...
# Make predictions
preds = gnb.predict(test)
print(preds)
Run the code and you’ll see the following results:
Jupyter Notebook with Python cell that prints the predicted values of the Naive Bayes classifier
on our test data
As you see in the Jupyter Notebook output, the predict() function
returned an array of 0s and 1s which represent our predicted values for
the tumor class (malignant vs. benign).
Now that we have our predictions, let’s evaluate how well our
classifier is performing.

最后修改: 2025年06月25日 星期三 11:42