机器学习: 5.6 进化与学习模型（Models of Evolution and Learning）

在许多自然系统中，个体生物在其一生中会显著地学习适应。与此同时，生物和社会过程也使得物种能在多代的时间尺度上进行适应。关于进化系统的一个有趣问题是：“个体一生中的学习与进化所带来的物种层面上的长期学习之间有何关系？”

拉马克进化论

拉马克是一位科学家，在十九世纪后期提出多代进化直接受个体生物一生中经历的影响。具体来说，他提出单个生物的经历直接影响其后代的基因构成：如果一个生物在其一生中学会了避免某种有毒食物，它就可以将这个特性通过基因遗传给其后代，从而其后代就不需要再学习这个特性。这是一个很有吸引力的猜想，因为它可能比忽略个体一生中获得经验的“生成-测试”过程（如遗传算法和遗传编程）带来更高效的进化进展。尽管这个理论很有吸引力，但目前的科学证据压倒性地驳斥了拉马克的模型。目前普遍接受的观点是，个体的基因构成实际上不受其生物学父母一生经历的影响。尽管存在这个明显的生物学事实，但最近的计算机研究表明，拉马克过程有时可以提高计算机遗传算法的有效性（参见 Grefenstette 1991；Ackley and Littman 1994；以及 Hart and Belew 1995）。

鲍德温效应

尽管拉马克进化论并非公认的生物进化模型，但已提出其他机制，通过这些机制，个体学习可以改变进化的进程。其中一种机制被称为鲍德温效应（Baldwin effect），以 J. M. Baldwin (1896) 的名字命名，他首先提出了这个想法。鲍德温效应基于以下观察：

如果一个物种在不断变化的环境中进化，就会存在进化压力，倾向于那些在其一生中具备学习能力的个体。例如，如果环境中出现新的捕食者，那么能够学会避开捕食者的个体将比不能学习的个体更成功。实际上，学习能力允许个体在其一生中进行小范围的局部搜索以最大化其适应度。相比之下，适应度完全由基因构成决定的非学习个体将处于相对劣势。
那些能够学习多种性状的个体将较少依赖其基因编码来“硬编码”性状。因此，这些个体可以支持更多样化的基因库，依靠个体学习来克服基因编码中“缺失”或“不够优化”的性状。这种更多样化的基因库反过来可以支持更快的进化适应。因此，个体的学习能力可以对整个种群的进化适应率产生间接的加速效应。

为了说明这一点，想象一下某个物种环境中的一些新变化，例如一个新的捕食者。这种变化将选择性地偏向那些能够学会避开捕食者的个体。随着种群中这种自我完善个体的比例增长，种群将能够支持更具多样性的基因库，从而允许进化过程（即使是非拉马克的生成-测试过程）更迅速地适应。这种加速适应反过来可能使标准进化过程更快地进化出一种遗传的（非习得的）性状来避开捕食者（例如，对这种动物的本能恐惧）。因此，鲍德温效应提供了一种间接机制，使得个体学习能够积极影响进化进程的速度。通过提高物种的生存能力和遗传多样性，个体学习支持更快的进化进展，从而增加了物种进化出更适应新环境的遗传、非习得性状的机会。

已经进行了多次尝试来开发计算模型以研究鲍德温效应。例如，Hinton 和 Nowlan (1987) 实验了进化一个由简单神经网络组成的种群，其中一些网络权重在个体网络的“生命周期”内是固定的，而另一些则是可训练的。个体的基因构成决定了哪些权重是可训练的，哪些是固定的。在他们的实验中，当不允许个体学习时，种群未能随时间提高其适应度。然而，当允许个体学习时，种群迅速提高了其适应度。在进化的早期世代，种群中包含更多具有许多可训练权重的个体。然而，随着进化的进行，固定的、正确的网络权重的数量倾向于增加，因为种群向基因给定的权重值进化，并且对个体学习权重的依赖性降低。Belew (1990)、Harvey (1993) 以及 French 和 Messinger (1994) 报告了鲍德温效应的其他计算研究。Mitchell (1996) 对此主题有很好的概述。期刊《进化计算》关于此主题的特刊（Turney et al. 1997）包含了多篇关于鲍德温效应的文章。

In many natural systems, individual organisms learn to adapt significantly during their lifetime.
At the same time, biological and social processes allow their species to adapt over a time frame of many
generations. One interesting question regarding evolutionary systems is "What is the relationship
between learning during the lifetime of a single individual, and the longer time frame species-level
learning afforded by evolution?'
Lamarckian Evolution
Larnarck was a scientist who, in the late nineteenth century, proposed that evolution over many generations was directly influenced by the expriences of individual organisms during their lifetime.in particular, he proposed that experiences of a single organism directly affected the genetic makeup of
their offspring: If an individual learned during its lifetime to avoid some toxic food, it could pass this
trait on genetically to its offspring, which therefore would not need to learn the trait. This is an
attractive conjecture, because it would presumably allow for more efficient evolutionary progress than
a generate-and-test process (like that of GAS and GPs) that ignores the experience gained during an
individual's lifetime. Despite the attractiveness of this theory, current scientific evidence
overwhelmingly contradicts Lamarck's model. The currently accepted view is that the genetic makeup
of an individual is, in fact, unaffected by the lifetime experience of one's biological parents. Despite this
apparent biological fact, recent computer studies have shown that Lamarckian processes can
sometimes improve the effectiveness of computerized genetic algorithms (see Grefenstette 1991;
Ackley and Littman 1994; and Hart and Belew 1995).
Baldwin Effect
Although Lamarckian evolution is not an accepted model of biological evolution, other mechanisms
have been suggested by which individual learning can alter the course of evolution. One such
mechanism is called the Baldwin effect, after J. M. Baldwin (1896), who first suggested the idea. The
Baldwin effect is based on the following observations:
 If a species is evolving in a changing environment, there will be evolutionary pressure to favor
individuals with the capability to learn during their lifetime. For example, if a new predator
appears in the environment, then individuals capable of learning to avoid the predator will be
more successful than individuals who cannot learn. In effect, the ability to learn allows an
individual to perform a small local search during its lifetime to maximize its fitness. In contrast,
nonlearning individuals whose fitness is fully determined by their genetic makeup will operate
at a relative disadvantage.
 Those individuals who are able to learn many traits will rely less strongly on their genetic code
to "hard-wire" traits. As a result, these individuals can support a more diverse gene pool, relying
on individual learning to overcome the "missing" or "not quite optimized" traits in the genetic
code. This more diverse gene pool can, in turn, support more rapid evolutionary adaptation.
Thus, the ability of individuals to learn can have an indirect accelerating effect on the rate of
evolutionary adaptation for the entire population.
To illustrate, imagine some new change in the environment of some species, such as a new
predator. Such a change will selectively favor individuals capable of learning to avoid the predator. As
the proportion of such self-improving individuals in the population grows, the population will be able to
support a more diverse gene pool, allowing evolutionary processes (even non-Lamarckian generate-
and-test processes) to adapt more rapidly. This accelerated adaptation may in turn enable standard
evolutionary processes to more quickly evolve a genetic (nonlearned) trait to avoid the predator (e.g.,
an instinctive fear of this animal). Thus, the Baldwin effect provides an indirect mechanism for
individual learning to positively impact the rate of evolutionary progress. By increasing survivability and
genetic diversity of the species, individual learning supports more rapid evolutionary progress, thereby
increasing the chance that the species will evolve genetic, nonlearned traits that better fit the new
environment.
There have been several attempts to develop computational models to study the Baldwin
effect. For example, Hinton and Nowlan (1987) experimented with evolving a population of simple
neural networks, in which some network weights were fixed during the individual network "lifetime,"
while others were trainable. The genetic makeup of the individual determined which weights were
114
trainable and which were fixed. In their experiments, when no individual learning was allowed, the
population failed to improve its fitness over time. However, when individual learning was allowed, the
population quickly improved its fitness. During early generations of evolution the population contained
a greater proportion of individuals with many trainable weights. However, as evolution proceeded, the
number of fixed, correct network weights tended to increase, as the population evolved toward
genetically given weight values and toward less dependence on individual learning of weights.
Additional computational studies of the Baldwin effect have been reported by Belew (1990), Harvey
(1993), and French and Messinger (1994). An excellent overview of this topic can be found in Mitchell
(1996). A special issue of the journal Evolutionary Computation on this topic (Turney et al. 1997)
contains several articles on the Baldwin effect.

Last modified: Friday, 20 June 2025, 11:55 AM