Peter Mills


Support This Project

libagf is a Peteysoft project

Consider the following generalization of a k-nearest neighbours scheme:

$\displaystyle P(j \vert \mathbf{x})$ $\displaystyle \approx$ $\displaystyle \frac{1}{W} \sum_{i, c_i=j} w_i(\sigma)$ (1)
$\displaystyle \sum_{i=0}^n w_i(\sigma)$ $\displaystyle =$ $\displaystyle W$ (2)

where $ \mathbf{x}$ is a test point, $ W$ is a constant, $ c_i$ is the class associated with the $ i$ th sample and $ w_i$ is a weight calculated via a filter function:

$\displaystyle w_i(\sigma) = f \left (\frac{\mathbf{x}-\mathbf{x_i}}{\sigma} \right )$ (3)

where $ f$ is a filter function and $ \sigma$ is the width of the filter.

The parameter, $ W$ , is equivalent to the number of nearest neighbours in a k-nearest-neighbours classifier and is held fixed by varying the filter width. This keeps a uniform number of samples within the central region of the filter.

An obvious choice for $ f$ would be a Gaussian:

$\displaystyle f(\mathbf{r})=\exp (-\vert\mathbf{r}\vert^2/2 )$ (4)

Where the upright brackets denote a metric, typically Cartesian. The width of the Gaussian can be effectively varied by simply squaring the weights, thus they may be pre-calculated for an initial, trial value of $ \sigma$ . Exponential interpolation is used to arrive at the final estimate.

The primary advantage of the above over a k-nearest-neighbours, is that it generates estimates that are both continuous and differentiable. Both features may be exploited, first to find the class borders, then to perform classifications and estimate the conditional probability. Let $ R$ be the difference in conditional probabilities between two classes:

$\displaystyle R(\mathbf{x}) = P(2 \vert \mathbf{x}) - P(1 \vert \mathbf{x}) \approx \frac{1}{W}
 \sum_i (2 c_i - 3) w_i$ (5)

Where $ 1$ and $ 2$ are the classes. The border between the two is found by setting this expression to zero. The procedure used was to randomly pick pairs of points that straddle the class border and then solve along the lines between them. Analytical derivatives are used as an aid to root-finding:

$\displaystyle \frac{\partial R}{\partial x_j} \approx \frac{1}{W \sigma^2}
...[x_{ij}-x_j - d_i^2 \frac{\sum_k
 w_k (x_{kj}-x_k)} {\sum_k d_k^2 w_k} \right ]$ (6)

The class of a test point is estimated as follows:
$\displaystyle j$ $\displaystyle =$ $\displaystyle \arg \underset{i}{\min} \vert \mathbf{x} - \mathbf{b_i} \vert$ (7)
$\displaystyle p$ $\displaystyle =$ $\displaystyle (\mathbf{x} - \mathbf{b_j}) \cdot \nabla_{\mathbf{x}} R (\mathbf{b_j})$ (8)
$\displaystyle c$ $\displaystyle =$ $\displaystyle ( 3 + p/\vert p\vert ) / 2$ (9)

where $ \{\mathbf{b_i}\}$ sample the class border and $ c$ is the retrieved class. The value of $ R$ may be extrapolated to the test point:

$\displaystyle R \approx \tanh p$ (10)

This algorithm is robust, general and efficient, yet still supplies knowledge of the conditional probabilities which are useful for gauging the accuracy of an estimate without prior knowledge of its true class.


Terrel and Scott (1992). "Variable kernel density estimation." Annals of statistics 20:1236-1265.

Peter Mills 2007-11-03