机器学习
在跳棋学习系统的设计过程中,可用于学习系统的训练经验类型将对学习的成功或失败产生重大影响。
1. 直接或间接训练经验
- 直接训练经验:提供单独的棋盘状态以及每个棋盘状态的正确走法。
- 间接训练经验:提供多局游戏的走法序列和最终结果(赢、输或平局)。如何将功劳或责任归因于单个走法,这就是归因问题(credit assignment problem)。
2. 有无教师
- 监督式学习:训练经验是有标签的,这意味着所有的棋盘状态都标有正确的走法。因此,学习是在监督者或教师的指导下进行的。
- 无监督式学习:训练经验是无标签的,这意味着所有棋盘状态都没有对应的走法。因此,学习者会生成随机的游戏并与自己对弈,没有监督。
- 半监督式学习:学习者生成游戏状态,如果棋盘状态令人困惑,则向教师寻求帮助以找到正确的走法。
3. 训练经验是否良好
训练示例是否代表了最终系统性能将要衡量的示例分布?当训练示例和测试示例来自相同或相似的分布时,性能最佳。
跳棋选手通过与自己对弈来学习。它的经验是间接的。它可能不会遇到人类专家对弈中常见的走法。一旦获得适当的训练经验,下一步的设计步骤将是选择目标函数。
During the design of the checker's learning system, the type of training experience available for a
learning system will have a significant effect on the success or failure of the learning.
1. Direct or Indirect training experience — In the case of direct training experience, an individual board
states
and
correct
move
for
each
board
state
are
given.
In case of indirect training experience, the move sequences for a game and the final result (win, loss
or draw) are given for a number of games. How to assign credit or blame to individual moves is the
credit assignment problem.
2. Teacher or Not — Supervised — The training experience will be labeled, which means, all the board
states will be labeled with the correct move. So the learning takes place in the presence of a
supervisor
or
a
teacher.
Unsupervised — The training experience will be unlabeled, which means, all the board states will not
have the moves. So the learner generates random games and plays against itself with no supervisioSemi-supervised — Learner generates game states and asks the teacher for help in finding the
correct move if the board state is confusing.
3.Is the training experience good — Do the training examples represent the distribution of examples
over which the final system performance will be measured? Performance is best when training
examples and test examples are from the same/a similar distribution.
The checker player learns by playing against oneself. Its experience is indirect. It may not encounter
moves that are common in human expert play. Once the proper training experience is available, the next
design step will be choosing the Target Function.