Python机器学习项目
训练过程涉及将训练数据集馈送通过图并优化损失函数。每当网络迭代一批更多的训练图像时,它会更新参数以减少损失,从而更准确地预测显示的数字。测试过程涉及将我们的测试数据集运行通过训练好的图,并跟踪正确预测的图像数量,以便我们可以计算准确性。
在开始训练过程之前,我们将定义评估准确性的方法,以便我们可以在训练时打印出小批量数据的准确性。这些打印语句将允许我们检查从第一次迭代到最后一次迭代,损失是否减少,准确性是否增加;它们还将允许我们跟踪我们是否已经运行了足够的迭代以达到一致和最优的结果:
main.py
...
correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
在 correct_pred
中,我们使用 arg_max
函数通过查看 output_layer
(预测)和 Y
(标签)来比较哪些图像被正确预测,并且我们使用 equal
函数将其作为布尔值列表返回。然后我们可以将此列表转换为浮点数并计算平均值以获得总准确率分数。
我们现在准备好初始化一个运行图的会话。在此会话中,我们将向网络馈送我们的训练示例,一旦训练完成,我们就会向同一个图馈送新的测试示例,以确定模型的准确性。将以下代码行添加到你的文件中:
main.py
...
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
深度学习中训练过程的本质是优化损失函数。在这里,我们的目标是最小化图像的预测标签与图像真实标签之间的差异。这个过程包括四个步骤,这些步骤会重复一定次数的迭代:
- 在网络中向前传播值
- 计算损失
- 在网络中向后传播值
- 更新参数
在每个训练步骤中,参数都会略微调整,以尝试减少下一步的损失。随着学习的进行,我们应该会看到损失的减少,最终我们可以停止训练,并将网络用作测试新数据的模型。
将此代码添加到文件中:
main.py
...
# train on mini batches
for i in range(n_iterations):
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(train_step, feed_dict={
X: batch_x, Y: batch_y, keep_prob: dropout
})
# print loss and accuracy (per minibatch)
if i % 100 == 0:
minibatch_loss, minibatch_accuracy = sess.run(
[cross_entropy, accuracy],
feed_dict={X: batch_x, Y: batch_y, keep_prob: 1.0}
)
print(
"Iteration",
str(i),
"\t| Loss =",
str(minibatch_loss),
"\t| Accuracy =",
str(minibatch_accuracy)
)
在每个训练步骤(其中我们将小批量图像通过网络馈送)的 100 次迭代后,我们打印出该批次的损失和准确性。请注意,我们不应该期望这里的损失持续减少和准确性持续增加,因为这些值是针对每个批次的,而不是针对整个模型。我们使用小批量图像而不是单独馈送它们,以加快训练过程,并允许网络在更新参数之前查看许多不同的示例。
一旦训练完成,我们就可以在测试图像上运行会话。这次我们使用 keep_prob
丢弃率为 1.0,以确保所有单元在测试过程中都处于活动状态。
将此代码添加到文件中:
main.py
...
test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y:
mnist.test.labels, keep_prob: 1.0})
print("\nAccuracy on test set:", test_accuracy)
现在是时候运行我们的程序,看看我们的神经网络识别手写数字的准确程度了。保存 main.py
文件并在终端中执行以下命令来运行脚本:
(tensorflow-demo) $ python main.py
你将看到与以下类似的输出,尽管单个损失和准确性结果可能略有不同:
Output
IterationIterationIterationIterationIterationIterationIterationIterationIterationIteration0
100
200
300
400
500
600
700
800
900
| Loss = 3.67079
| Loss = 0.492122
| Loss = 0.421595
| Loss = 0.307726
| Loss = 0.392948
| Loss = 0.371461
| Loss = 0.378425
| Loss = 0.338605
| Loss = 0.379697
| Loss = 0.444303
| Accuracy = 0.140625
| Accuracy = 0.84375
| Accuracy = 0.882812
| Accuracy = 0.921875
| Accuracy = 0.882812
| Accuracy = 0.90625
| Accuracy = 0.882812
| Accuracy = 0.914062
| Accuracy = 0.875
| Accuracy = 0.90625
Accuracy on test set: 0.9206
为了尝试提高我们模型的准确性,或者了解调整超参数的影响,我们可以测试改变学习率、丢弃阈值、批量大小和迭代次数的效果。我们还可以改变隐藏层中的单元数量,以及改变隐藏层本身的数量,以查看不同的架构如何提高或降低模型的准确性。
为了证明网络确实识别了手绘图像,让我们用我们自己的一张图像进行测试。
如果你在本地机器上,并且想使用自己手绘的数字,你可以使用图形编辑器创建自己的 28x28 像素数字图像。否则,你可以使用 curl
将以下示例测试图像下载到你的服务器或计算机:
(tensorflow-demo) $ curl -O images/test_img.png
在编辑器中打开 main.py
文件,并在文件顶部添加以下代码行,以导入图像处理所需的两个库。
main.py
import numpy as np
from PIL import Image
...
然后在文件末尾,添加以下代码行来加载手写数字的测试图像:
main.py
...
img = np.invert(Image.open("test_img.png").convert('L')).ravel()
Image 库的 open
函数将测试图像加载为包含三个 RGB 颜色通道和 Alpha 透明度的 4D 数组。这与我们之前使用 TensorFlow 读取数据集时的表示方式不同,因此我们需要做一些额外的工作来匹配格式。
首先,我们使用 convert
函数和 L
参数将 4D RGBA 表示减少为一个灰度颜色通道。我们将其存储为 NumPy 数组,并使用 np.invert
反转它,因为当前的矩阵将黑色表示为 0,白色表示为 255,而我们需要相反的表示。最后,我们调用 ravel
来展平数组。
现在图像数据结构正确了,我们可以像以前一样运行一个会话,但这次只输入一张图像进行测试。
将以下代码添加到你的文件中,以测试图像并打印输出的标签。
main.py
...
prediction = sess.run(tf.argmax(output_layer, 1), feed_dict={X: [img]})
print ("Prediction for test image:", np.squeeze(prediction))
np.squeeze
函数在预测上被调用,以从数组中返回单个整数(即从 [2]
变为 2
)。结果输出表明网络已将此图像识别为数字 2。
Output
Prediction for test image: 2
你可以尝试使用更复杂的图像来测试网络——例如,看起来像其他数字的数字,或者绘制得很差或不正确的数字——看看它的表现如何。
你对尝试用自己手绘的数字来测试神经网络感兴趣吗?
Step 5 — Training and Testing
The training process involves feeding the training dataset through the
graph and optimizing the loss function. Every time the network iterates
through a batch of more training images, it updates the parameters to
reduce the loss in order to more accurately predict the digits shown. The
testing process involves running our testing dataset through the trained
graph, and keeping track of the number of images that are correctly
predicted, so that we can calculate the accuracy.
Before starting the training process, we will define our method of
evaluating the accuracy so we can print it out on mini-batches of data
while we train. These printed statements will allow us to check that from
the first iteration to the last, loss decreases and accuracy increases; they
will also allow us to track whether or not we have ran enough iterations
to reach a consistent and optimal result:
main.py
...
correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
In correct_pred, we use the arg_max function to compare which
images are being predicted correctly by looking at the output_layer
(predictions) and Y (labels), and we use the equal function to return this
as a list of Booleans. We can then cast this list to floats and calculate the
mean to get a total accuracy score.
We are now ready to initialize a session for running the graph. In this
session we will feed the network with our training examples, and once
trained, we feed the same graph with new test examples to determine the
accuracy of the model. Add the following lines of code to your file:
main.py
...
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
The essence of the training process in deep learning is to optimize the
loss function. Here we are aiming to minimize the difference between the
predicted labels of the images, and the true labels of the images. The
process involves four steps which are repeated for a set number of
iterations:
Propagate values forward through the network
Compute the loss
Propagate values backward through the network
Update the parameters
At each training step, the parameters are adjusted slightly to try and
reduce the loss for the next step. As the learning progresses, we should
see a reduction in loss, and eventually we can stop training and use the
network as a model for testing our new data.
Add this code to the file:
main.py
...
# train on mini batches
for i in range(n_iterations):
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(train_step, feed_dict={
X: batch_x, Y: batch_y, keep_prob: dropout
})
# print loss and accuracy (per minibatch)
if i % 100 == 0:
minibatch_loss, minibatch_accuracy = sess.run(
[cross_entropy, accuracy],
feed_dict={X: batch_x, Y: batch_y, keep_prob: 1.0}
)
print(
"Iteration",
str(i),
"\t| Loss =",
str(minibatch_loss),
"\t| Accuracy =",
str(minibatch_accuracy)
)
After 100 iterations of each training step in which we feed a mini-batch
of images through the network, we print out the loss and accuracy of that
batch. Note that we should not be expecting a decreasing loss and
increasing accuracy here, as the values are per batch, not for the entire
model. We use mini-batches of images rather than feeding them through
individually to speed up the training process and allow the network to
see a number of different examples before updating the parameters.
Once the training is complete, we can run the session on the test
images. This time we are using a keep_prob dropout rate o f 1.0 to
ensure all units are active in the testing process.
Add this code to the file:
main.py
...
test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y:
mnist.test.labels, keep_prob: 1.0})
print("\nAccuracy on test set:", test_accuracy)
It’s now time to run our program and see how accurately our neural
network can recognize these handwritten digits. Save the main.py file
and execute the following command in the terminal to run the script:
(tensorflow-demo) $ python main.py
You’ll see an output similar to the following, although individual loss
and accuracy results may vary slightly:
Output
IterationIterationIterationIterationIterationIterationIterationIterationIterationIteration0
100
200
300
400
500
600
700
800
900
| Loss = 3.67079
| Loss = 0.492122
| Loss = 0.421595
| Loss = 0.307726
| Loss = 0.392948
| Loss = 0.371461
| Loss = 0.378425
| Loss = 0.338605
| Loss = 0.379697
| Loss = 0.444303
| Accuracy = 0.140625
| Accuracy = 0.84375
| Accuracy = 0.882812
| Accuracy = 0.921875
| Accuracy = 0.882812
| Accuracy = 0.90625
| Accuracy = 0.882812
| Accuracy = 0.914062
| Accuracy = 0.875
| Accuracy = 0.90625
Accuracy on test set: 0.9206
To try and improve the accuracy of our model, or to learn more about
the impact of tuning hyperparameters, we can test the effect of changing
the learning rate, the dropout threshold, the batch size, and the number
of iterations. We can also change the number of units in our hidden
layers, and change the amount of hidden layers themselves, to see how
different architectures increase or decrease the model accuracy.
To demonstrate that the network is actually recognizing the hand-
drawn images, let’s test it on a single image of our own.
If you are on a local machine and you would like to use your own
hand-drawn number, you can use a graphics editor to create your own
28x28 pixel image of a digit. Otherwise, you can use curl to download
the following sample test image to your server or computer:
(tensorflow-demo) $ curl -O images/test_img.png
Open the main.py file in your editor and add the following lines of
code to the top of the file to import two libraries necessary for image
manipulation.
main.py
import numpy as np
from PIL import Image
...
Then at the end of the file, add the following line of code to load the
test image of the handwritten digit:
main.py
...
img = np.invert(Image.open("test_img.png").convert('L')).ravel()
The open function of the Image library loads the test image as a 4D
array containing the three RGB color channels and the Alpha
transparency. This is not the same representation we used previously
when reading in the dataset with TensorFlow, so we’ll need to do some
extra work to match the format.
First, we use the convert function with the L parameter to reduce the
4D RGBA representation to one grayscale color channel. We store this as
a numpy array and invert it using np.invert, because the current
matrix represents black as 0 and white as 255, whereas we need the
opposite. Finally, we call ravel to flatten the array.
Now that the image data is structured correctly, we can run a session in
the same way as previously, but this time only feeding in the single
image for testing.
Add the following code to your file to test the image and print the
outputted label.
main.py
...
prediction = sess.run(tf.argmax(output_layer, 1), feed_dict={X: [img]})
print ("Prediction for test image:", np.squeeze(prediction))
Th e np.squeeze function is called on the prediction to return the
single integer from the array (i.e. to go from [2] to 2). The resulting
output demonstrates that the network has recognized this image as the
digit 2.
Output
Prediction for test image: 2
You can try testing the network with more complex images –– digits
that look like other digits, for example, or digits that have been drawn
poorly or incorrectly –– to see how well it fares.