最近邻方法也可以用于回归,通过返回一个点邻居的平均值,或者样条曲线或类似的拟合作为新值。最常用的方法被称为核平滑器(kernel smoothers),它们使用一个核函数(点对之间的加权函数),根据每个数据点与输入的距离来决定对其贡献(权重)的强调程度。

这里我们将简单介绍两种用于平滑的核函数。这两种核函数都旨在给离当前输入点更近的数据点更大的权重,并且随着它们超出当前输入的范围(由参数 λ 指定),权重平滑地减小到零。

它们是:

  • Epanechnikov 二次核函数

  • Tricube 核函数

这些核函数的使用结果如下图 6 所示,数据集包含新西兰北岛中部大火山鲁阿佩胡山喷发之间的时间(技术上称为休止期)和喷发持续时间。这里使用了 λ 值为 2 和 4。选择 λ 需要进行实验。较大的值会对更多数据点进行平均,因此会产生较低的方差,但代价是较高的偏差


图 6:最近邻方法和两个核平滑器在鲁阿佩胡山 1860-2006 年喷发持续时间和休止期数据上的输出。



Nearest neighbour methods can also be used for regression by returning the average value of the
neighbours to a point, or a spline or similar fit as the new value. The most common methods are known
as kernel smoothers, and they use a kernel (a weighting function between pairs of points) that decides
how much emphasis (weight) to put onto the contribution from each datapoint according to its distance
from the input.
Here we shall simply use two kernels that are used for smoothing. Both of these kernels are
designed to give more weight to points that are closer to the current input, with the weights decreasing
smoothly to zero as they pass out of the range of the current input, with the range specified by a
parameter λ.
They are the Epanechnikov quadratic kernel:
and the tricube kernel:
The results of using these kernels are shown in below Figure 6 on a dataset that consists of the time
between eruptions (technically known as the repose) and the duration of the eruptions of Mount
Ruapehu, the large volcano in the centre of New Zealand’s north island. Values of λ of 2 and 4 were
used here. Picking λ requires experimentation. Large values average over more datapoints, and
therefore produce lower variance, but at the cost of higher bias.
79
FIGURE 6: Output of the nearest neighbour method and two kernel smoothers on the data of duration
and repose of eruptions of Mount Ruapehu 1860–2006.

Last modified: Friday, 20 June 2025, 10:00 AM