机器学习
既然我们已经明确了理想的目标函数 V,我们必须选择一个学习程序将用来描述它所学习的函数 V^ 的表示方法。就像之前的设计选择一样,我们再次有多种选择。例如,我们可以允许程序使用一个大型表格来表示,其中每个独特的棋盘状态都有一个单独的条目指定其值。或者,我们可以让它使用一组与棋盘状态特征匹配的规则,或者预定义棋盘特征的二次多项式函数,或者一个人工神经网络来表示。通常,这种表示方法的选择涉及一个关键的权衡。
一方面,我们希望选择一个表达能力非常强的表示方法,以尽可能接近地近似理想的目标函数 V。
另一方面,表示能力越强,程序为了在它可以表示的替代假设中进行选择,所需的训练数据就越多。为了简化讨论,让我们选择一个简单的表示方法:
对于任何给定的棋盘状态,函数 V^ 将被计算为以下棋盘特征的线性组合:
- x1(b) — 棋盘 b 上的黑色棋子数量
- x2(b) — 棋盘 b 上的红色棋子数量
- x3(b) — 棋盘 b 上的黑色王棋数量
- x4(b) — 棋盘 b 上的红色王棋数量
- x5(b) — 受到黑色威胁的红色棋子数量(即黑色在下一回合可以吃掉的棋子)
- x6(b) — 受到红色威胁的黑色棋子数量
其中 w0 到 w6 是通过学习算法获得的数值系数或权重。
权重w1到w6将决定不同板特征的相对重要性
此时机器学习问题的具体说明
到目前为止,我们已经讨论了训练经验的类型选择、目标函数的选择及其表示方法。跳棋学习任务可以总结如下:
- 任务 T:玩跳棋
- 性能衡量 P:在世界锦标赛中赢得比赛的百分比
- 训练经验 E:与自身对弈的机会
- 目标函数:
- 目标函数表示:
上述前三项对应于学习任务的规范,而最后两项构成了学习程序实现的设计选择。
Now that we have specified the ideal target function V, we must choose a representation that
the learning program will use to describe the function ^V that it will learn. As with earlier design
choices, we again have many options. We could, for example, allow the program to represent using a
large table with a distinct entry specifying the value for each distinct board state. Or we could allow it to
represent using a collection of rules that match against features of the board state, or a quadratic
polynomial function of predefined board features, or an artificial
neural network. In general, this choice of representation involves a crucial tradeoff. On one hand, we
wish to pick a very expressive representation to allow representing as close an approximation as
possible to the ideal target function V.
On the other hand, the more expressive the representation, the more training data the program
will require in order to choose among the alternative hypotheses it can represent. To keep the
discussion brief, let us choose a simple representation:
for any given board state, the function ^V will be calculated as a linear combination of the following
board features:
x1(b) — number of black pieces on board b
x2(b) — number of red pieces on b
x3(b) — number of black kings on b
x4(b) — number of red kings on b
x5(b) — number of red pieces threatened by black (i.e., which can be taken on black’s next turn)
x6(b) — number of black pieces threatened by red
^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 · x4(b) +w5 · x5(b) + w6 · x6(b)
Where w0 through w6 are numerical coefficients or weights to be obtained by a learning algorithm.
weights w1 to w6 will determine the relative importance of different board featuresSpecification of the Machine Learning Problem at this time — Till now we worked on choosing the type
of training experience, choosing the target function and its representation. The checkers learning task
can be summarized as below.
Task T : Play Checkers
Performance Measure : % of games won in world tournament
Training Experience E : opportunity to play against itself
Target Function : V : Board → R
Target Function Representation : ^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 · x4(b) +w5 ·
x5(b) + w6 · x6(b)
The first three items above correspond to the specification of the learning task,whereas the final two
items constitute design choices for the implementation of the learning program.