机器学习的视角

一个有用的机器学习视角是:它涉及在巨大的可能假设空间中进行搜索,以确定最能拟合观测数据和学习器所持有的任何先验知识的假设。

例如,考虑上面提到的跳棋学习器原则上可能输出的假设空间。这个假设空间由所有可以通过选择权重 w0w6 的值来表示的评估函数组成。因此,学习器的任务是遍历这个巨大的空间,以找到与可用训练示例最一致的假设。用于拟合权重的 LMS 算法通过迭代调整权重来实现这一目标,每次假设的评估函数预测的值与训练值不同时,都会对每个权重进行校正。当学习器考虑的假设表示定义了一个连续参数化的潜在假设空间时,这个算法效果很好。

本书中的许多章节介绍了通过某种底层表示(例如,线性函数、逻辑描述、决策树、人工神经网络)定义的假设空间进行搜索的算法。这些不同的假设表示适用于学习不同类型的目标函数。对于每种假设表示,相应的学习算法都利用了不同的底层结构来组织在假设空间中的搜索。

在本书中,我们将不断回到将学习视为搜索问题的这个视角,以便通过它们的搜索策略以及它们所探索的搜索空间的底层结构来描述学习方法。我们还会发现这个观点有助于正式分析要搜索的假设空间大小、可用训练示例数量以及我们对与训练数据一致的假设能够正确泛化到未见示例的信心之间的关系。


机器学习中的问题

我们的跳棋示例引出了关于机器学习的许多普遍性问题。机器学习领域以及本书的大部分内容都致力于回答以下问题:

  • 存在哪些算法可以从特定的训练示例中学习通用目标函数?在什么情况下,给定足够的训练数据,特定算法会收敛到所需的函数?哪些算法最适合哪种类型的问题和表示?
  • 多少训练数据才足够?可以找到哪些通用界限来关联所学假设的置信度与训练经验量和学习器假设空间的特征?
  • 学习者持有的先验知识何时以及如何能够指导从示例中泛化的过程?即使先验知识只是近似正确,它也能有帮助吗?
  • 选择有用的下一个训练经验的最佳策略是什么,以及这种策略的选择如何改变学习问题的复杂性?
  • 将学习任务简化为一个或多个函数近似问题的最佳方法是什么?换句话说,系统应该尝试学习哪些特定函数?这个过程本身可以自动化吗?
  • 学习器如何自动改变其表示以提高其表示和学习目标函数的能力?

Perspectives in Machine Learning
One useful perspective on machine learning is that it involves searching a very large space of
possible hypotheses to determine one that best fits the observed data and any prior knowledge held by
the learner.
For example, consider the space of hypotheses that could in principle be output by the above checkers
learner. This hypothesis space consists of all evaluation functions that can be represented by some
choice of values for the weights wo through w6. The learner's task is thus to search through this vast
space to locate the hypothesis that is most consistent with the available training examples. The LMS
algorithm for fitting weights achieves this goal by iteratively tuning the weights, adding a correction to
each weight each time the hypothesized evaluation function predicts a value that differs from the
training value. This algorithm works well when the hypothesis representation considered by the learner
defines a continuously parameterized space of potential hypotheses.
Many of the chapters in this book present algorithms that search a hypothesis space defined by
some underlying representation (e.g., linear functions, logical descriptions, decision trees, artificial
neural networks). These different hypothesis representations are appropriate for learning different
kinds of target functions. For each of these hypothesis representations, the corresponding learning
algorithm takes advantage of a different underlying structure to organize the search through the
hypothesis space.
Throughout this book we will return to this perspective of learning as a search problem in order
to characterize learning methods by their search strategies and by the underlying structure of the
search spaces they explore. We will also find this viewpoint useful in formally analyzing the relationship
between the size of the hypothesis space to be searched, the number of training examples available,
and the confidence we can have that a hypothesis consistent with the training data will correctly
generalize to unseen examples.
Issues in Machine Learning
Our checkers example raises a number of generic questions about machine learning. The field of
machine learning, and much of this book, is concerned with answering questions such as the following:






What algorithms exist for learning general target functions from specific training examples? In
what settings will particular algorithms converge to the desired function, given sufficient
training data? Which algorithms perform best for which types of problems and representations?
How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypotheses to the amount of training experience and the character of the
learner's hypothesis space?
When and how can prior knowledge held by the learner guide the process of generalizing from
examples? Can prior knowledge be helpful even when it is only approximately correct?
What is the best strategy for choosing a useful next training experience, and how does the
choice of this strategy alter the complexity of the learning problem?
What is the best way to reduce the learning task to one or more function approximation
problems? Put another way, what specific functions should the system attempt to learn? Can
this process itself be automated?
How can the learner automatically alter its representation to improve its ability to represent
and learn the target function?

最后修改: 2025年06月18日 星期三 22:16