agf_simple

Next:

   Contact:
   Peter Mills

   Previous:
   Home

libagf is a Peteysoft project

Consider the following generalization of a k-nearest neighbours scheme:

$\displaystyle P(j \vert \mathbf{x})$	$\displaystyle \approx$	$\displaystyle \frac{1}{W} \sum_{i, c_i=j} w_i(\sigma)$	(1)
$\displaystyle \sum_{i=0}^n w_i(\sigma)$	$\displaystyle =$	$\displaystyle W$	(2)

where $\mathbf{x}$ is a test point,

is a constant,

is the class associated with the

th sample and

is a weight calculated via a filter function:

$\displaystyle w_i(\sigma) = f \left (\frac{\mathbf{x}-\mathbf{x_i}}{\sigma} \right )$

(3)

where

is a filter function and $\sigma$ is the width of the filter.

The parameter, , is equivalent to the number of nearest neighbours in a k-nearest-neighbours classifier and is held fixed by varying the filter width. This keeps a uniform number of samples within the central region of the filter.

An obvious choice for would be a Gaussian:

$\displaystyle f(\mathbf{r})=\exp (-\vert\mathbf{r}\vert^2/2 )$

(4)

Where the upright brackets denote a metric, typically Cartesian. The width of the Gaussian can be effectively varied by simply squaring the weights, thus they may be pre-calculated for an initial, trial value of $\sigma$ . Exponential interpolation is used to arrive at the final estimate.

The primary advantage of the above over a k-nearest-neighbours, is that it generates estimates that are both continuous and differentiable. Both features may be exploited, first to find the class borders, then to perform classifications and estimate the conditional probability. Let be the difference in conditional probabilities between two classes:

$\displaystyle R(\mathbf{x}) = P(2 \vert \mathbf{x}) - P(1 \vert \mathbf{x}) \approx \frac{1}{W} \sum_i (2 c_i - 3) w_i$

(5)

Where

and

are the classes. The border between the two is found by setting this expression to zero. The procedure used was to randomly pick pairs of points that straddle the class border and then solve along the lines between them. Analytical derivatives are used as an aid to root-finding:

$\displaystyle \frac{\partial R}{\partial x_j} \approx \frac{1}{W \sigma^2} \su... ...[x_{ij}-x_j - d_i^2 \frac{\sum_k w_k (x_{kj}-x_k)} {\sum_k d_k^2 w_k} \right ]$

(6)

The class of a test point is estimated as follows:

$\displaystyle j$	$\displaystyle =$	$\displaystyle \arg \underset{i}{\min} \vert \mathbf{x} - \mathbf{b_i} \vert$	(7)
$\displaystyle p$	$\displaystyle =$	$\displaystyle (\mathbf{x} - \mathbf{b_j}) \cdot \nabla_{\mathbf{x}} R (\mathbf{b_j})$	(8)
$\displaystyle c$	$\displaystyle =$	$\displaystyle ( 3 + p/\vert p\vert ) / 2$	(9)

where $\{\mathbf{b_i}\}$ sample the class border and

is the retrieved class. The value of

may be extrapolated to the test point:

$\displaystyle R \approx \tanh p$

(10)

This algorithm is robust, general and efficient, yet still supplies knowledge of the conditional probabilities which are useful for gauging the accuracy of an estimate without prior knowledge of its true class.

References

Terrel and Scott (1992). "Variable kernel density estimation." Annals of statistics 20:1236-1265.

About this document ...

Peter Mills 2007-11-03