Next:

Contact:
Peter Mills

Previous:
Home

libagf is a Peteysoft project

Consider the following generalization of a k-nearest neighbours scheme:

 (1) (2)

where is a test point, is a constant, is the class associated with the th sample and is a weight calculated via a filter function:

 (3)

where is a filter function and is the width of the filter.

The parameter, , is equivalent to the number of nearest neighbours in a k-nearest-neighbours classifier and is held fixed by varying the filter width. This keeps a uniform number of samples within the central region of the filter.

An obvious choice for would be a Gaussian:

 (4)

Where the upright brackets denote a metric, typically Cartesian. The width of the Gaussian can be effectively varied by simply squaring the weights, thus they may be pre-calculated for an initial, trial value of . Exponential interpolation is used to arrive at the final estimate.

The primary advantage of the above over a k-nearest-neighbours, is that it generates estimates that are both continuous and differentiable. Both features may be exploited, first to find the class borders, then to perform classifications and estimate the conditional probability. Let be the difference in conditional probabilities between two classes:

 (5)

Where and are the classes. The border between the two is found by setting this expression to zero. The procedure used was to randomly pick pairs of points that straddle the class border and then solve along the lines between them. Analytical derivatives are used as an aid to root-finding:

 (6)

The class of a test point is estimated as follows:
 (7) (8) (9)

where sample the class border and is the retrieved class. The value of may be extrapolated to the test point:

 (10)

This algorithm is robust, general and efficient, yet still supplies knowledge of the conditional probabilities which are useful for gauging the accuracy of an estimate without prior knowledge of its true class.

## References

Terrel and Scott (1992). "Variable kernel density estimation." Annals of statistics 20:1236-1265.

Peter Mills 2007-11-03