机器学习
- Boosting:在前一个(或多个)学习器所犯的错误上训练下一个学习器。
在 Bagging 中,生成互补的基学习器是偶然的,并且取决于学习方法的不稳定性。而在 Boosting 中,我们积极地尝试生成互补的基学习器,通过在前一个学习器的错误上训练下一个学习器。最初的 Boosting 算法结合了三个弱学习器来生成一个强学习器。一个弱学习器的错误概率小于 1/2,这使得它在二分类问题上优于随机猜测;而一个强学习器则具有任意小的错误概率。
原始 Boosting 概念
给定一个大型训练集,我们随机将其分成三部分:X1,X2,X3。
- 我们使用 X1 并训练第一个学习器 d1。
- 然后我们取 X2 并将其输入到 d1。我们从 X2 中选取所有被 d1 错误分类的实例,以及同样多被 d1 正确分类的实例,这些共同构成了第二个学习器 d2 的训练集。
- 接着我们取 X3 并将其输入到 d1 和 d2。d1 和 d2 意见不一致的实例构成了第三个学习器 d3 的训练集。
- 在测试时,给定一个实例,我们将其输入到 d1 和 d2;如果它们一致,则以此作为响应;否则,取 d3 的响应作为输出。
总结步骤:
- 将数据 X 分割为 {X1,X2,X3}。
- 在 X1 上训练 d1。
- 在 X2 上测试 d1。
- 在 d1 在 X2 上犯的错误(加上一些正确实例)上训练 d2。
- 在 X3 上测试 d1 和 d2。
- 在 d1 和 d2 之间存在分歧的实例上训练 d3。
- 测试时:应用 d1 和 d2;如果存在分歧,则使用 d3。
缺点
这种方法的主要缺点是需要非常大的训练集。因为训练集被分成三部分,而且第二个和第三个分类器只在之前学习器出错的子集上进行训练。因此,除非你有一个相当大的训练集,d2 和 d3 可能没有足够大的训练集。
尽管有这个缺点,整体系统的错误率会降低,并且通过递归地使用这种系统(即,将三个模型的 Boosting 系统用作更高层系统中的 dj),错误率可以任意降低。
3.5 Boosting
Boosting: train next learner on mistakes made by previous learner(s)
In bagging, generating complementary base-learners is left to chance and to the unstability of the
learning method. In boosting, we actively try to generate complementary base-learners by training the
next learner on the mistakes of the previous learners. The original boosting algorithm combines three
weak learners to generate a strong learner. A weak learner has error probability less than 1/2, which
makes it better than random guessing on a two-class problem, and a strong learner has arbitrarily small
error probability.
Original Boosting Concept
Given a large training set, we randomly divide it into three. We use X1 and train d1. We then take X2
and feed it to d1. We take all instances misclassified by d1 and also as many instances on which d1 is
correct
from X2, and these together form the training set of d2. We then take X3 and feed it to d1 and d2. The
instances on which d1 and d2 disagree form the training set of d3. During testing, given an instance, we
give it to d1 and d2; if they agree, that is the response, otherwise the response of d3 is taken as the
output.
1.2.
3.4.Split data X into {X1 , X2, X3}
Train d1 on X1
Test d1 on X2
Train d2 on d1’s mistakes on X2 (plus some right)
Test d1 and d2 on X3
Train d3 on disagreements between d1 and d2
Testing: apply d1 and d2; if disagree, use d3
Drawback: need large X
overall system has reduced error rate, and the error rate can arbitrarily be reduced by using such
systems recursively, that is, a boosting system of three models used as dj in a higher system.
Though it is quite successful, the disadvantage of the original boosting method is that it requires a very
large training sample. The sample should be divided into three and furthermore, the second and third
classifiers are only trained on a subset on which the previous ones err. So unless one has a quite large
training set, d2 and d3 will not have training