Python机器学习项目
为了构建我们的网络,我们将把网络设置为一个计算图,供 TensorFlow 执行。TensorFlow 的核心概念是张量 (tensor),这是一种类似于数组或列表的数据结构。张量在通过图时进行初始化、操作,并在学习过程中更新。
我们将从定义三个张量作为占位符 (placeholders) 开始,这些占量我们稍后会为其提供值。将以下内容添加到你的文件中:
main.py
...
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
keep_prob = tf.placeholder(tf.float32)
唯一需要在声明时指定的参数是我们将要输入的数据的大小。对于 X
,我们使用 [None, 784]
的形状,其中 None
代表任意数量,因为我们将输入不定数量的 784 像素图像。Y
的形状是 [None, 10]
,因为我们将把它用于不定数量的标签输出,有 10 个可能的类别。keep_prob
张量用于控制丢弃率 (dropout rate),我们将其初始化为占位符而不是不可变变量,因为我们希望在训练(丢弃率设置为 0.5)和测试(丢弃率设置为 1.0)时都使用同一个张量。
网络将在训练过程中更新的参数是权重 (weight) 和偏置 (bias) 值,因此对于这些值,我们需要设置一个初始值而不是空占位符。这些值本质上是网络进行学习的地方,因为它们用于神经元的激活函数中,代表单元之间连接的强度。
由于这些值在训练期间会进行优化,我们现在可以将其设置为零。但初始值实际上对模型的最终准确性有显著影响。我们将使用截断正态分布中的随机值作为权重。我们希望它们接近零,这样它们就可以向正向或负向调整,并且略有不同,以便它们产生不同的误差。这将确保模型学习到有用的东西。添加这些行:
main.py
...
weights = {
'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),
'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output], stddev=0.1)),
}
对于偏置 (bias),我们使用一个小的常数值,以确保张量在初始阶段激活并因此有助于传播。权重和偏置张量存储在字典对象中以便于访问。将此代码添加到你的文件中以定义偏置:
main.py
...
biases = {
'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}
接下来,通过定义将操作张量的操作来设置网络的层。将这些行添加到你的文件中:
main.py
...
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = tf.nn.dropout(layer_3, keep_prob)
output_layer = tf.matmul(layer_3, weights['out']) + biases['out']
每个隐藏层将对前一层的输出和当前层的权重执行矩阵乘法,并将偏置添加到这些值中。在最后一个隐藏层,我们将使用 keep_prob
值 0.5 应用丢弃操作。
构建图的最后一步是定义我们要优化的损失函数 (loss function)。TensorFlow 程序中常用的损失函数是交叉熵 (cross-entropy),也称为对数损失 (log-loss),它量化了两个概率分布(预测和标签)之间的差异。完美的分类将导致交叉熵为 0,损失完全最小化。
我们还需要选择用于最小化损失函数的优化算法。一种名为梯度下降优化 (gradient descent optimization) 的过程是一种通过沿梯度负向(下降)方向迭代步骤来寻找函数(局部)最小值的常用方法。TensorFlow 中已经实现了多种梯度下降优化算法,在本教程中我们将使用 Adam 优化器 (Adam optimizer)。它通过使用动量来加速过程,通过计算梯度的指数加权平均值并在调整中使用它来扩展梯度下降优化。将以下代码添加到你的文件中:
main.py
...
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels=Y, logits=output_layer
))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
我们现在已经定义了网络并用 TensorFlow 构建了它。下一步是将数据通过图馈送以进行训练,然后测试它是否真正学到了什么。
我们已经完成了网络架构的定义和 TensorFlow 图的构建。现在我们将进入训练和测试阶段。准备好了吗?
Step 4 — Building the TensorFlow Graph
To build our network, we will set up the network as a computational
graph for TensorFlow to execute. The core concept of TensorFlow is the
tensor, a data structure similar to an array or list. initialized, manipulated
as they are passed through the graph, and updated through the learning
process.
We’ll start by defining three tensors as placeholders, which are tensors
that we’ll feed values into later. Add the following to your file:
main.py
...
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
keep_prob = tf.placeholder(tf.float32)
The only parameter that needs to be specified at its declaration is the
size of the data we will be feeding in. For X we use a shape of [None,
784], where None represents any amount, as we will be feeding in an
undefined number of 784-pixel images. The shape of Y is [None, 10] as
we will be using it for an undefined number of label outputs, with 10
possible classes. The keep_prob tensor is used to control the dropout
rate, and we initialize it as a placeholder rather than an immutable
variable because we want to use the same tensor both for training (when
dropout is set to 0.5) and testing (when dropout is set to 1.0).
The parameters that the network will update in the training process are
the weight and bias values, so for these we need to set an initial value
rather than an empty placeholder. These values are essentially where the
network does its learning, as they are used in the activation functions of
the neurons, representing the strength of the connections between units.
Since the values are optimized during training, we could set them to
zero for now. But the initial value actually has a significant impact on the
final accuracy of the model. We’ll use random values from a truncated
normal distribution for the weights. We want them to be close to zero, so
they can adjust in either a positive or negative direction, and slightly
different, so they generate different errors. This will ensure that the
model learns something useful. Add these lines:
main.py
...
weights = {
'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1],
stddev=0.1)),
'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2],
stddev=0.1)),
'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3],
stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output],
stddev=0.1)),
}
For the bias, we use a small constant value to ensure that the tensors
activate in the intial stages and therefore contribute to the propagation.
The weights and bias tensors are stored in dictionary objects for ease of
access. Add this code to your file to define the biases:
main.py
...
biases = {
'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}
Next, set up the layers of the network by defining the operations that
will manipulate the tensors. Add these lines to your file:
main.py
...
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = tf.nn.dropout(layer_3, keep_prob)
output_layer = tf.matmul(layer_3, weights['out']) + biases['out']
Each hidden layer will execute matrix multiplication on the previous
layer’s outputs and the current layer’s weights, and add the bias to these
values. At the last hidden layer, we will apply a dropout operation using
our keep_prob value of 0.5.
The final step in building the graph is to define the loss function that
we want to optimize. A popular choice of loss function in TensorFlow
programs is cross-entropy, also known as log-loss, which quantifies the
difference between two probability distributions (the predictions and the
labels). A perfect classification would result in a cross-entropy of 0, with
the loss completely minimized.
We also need to choose the optimization algorithm which will be used
to minimize the loss function. A process named gradient descent
optimization is a common method for finding the (local) minimum of a
function by taking iterative steps along the gradient in a negative
(descending) direction. There are several choices of gradient descent
optimization algorithms already implemented in TensorFlow, and in this
tutorial we will be using the Adam optimizer. This extends upon
gradient descent optimization by using momentum to speed up the
process through computing an exponentially weighted average of the
gradients and using that in the adjustments. Add the following code to
your file:
main.py
...
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels=Y, logits=output_layer
))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
We’ve now defined the network and built it out with TensorFlow. The
next step is to feed data through the graph to train it, and then test that it
has actually learnt something.