\documentclass[12pt]{article}
\usepackage{amsmath}
\begin{document}
Consider the following generalization of a k-nearest neighbours scheme:
\begin{eqnarray}
P(j | \mathbf{x}) & \approx & \frac{1}{W} \sum_{i, c_i=j} w_i(\sigma) \\
\sum_{i=0}^n w_i(\sigma) &=& W
\end{eqnarray}
where $\mathbf{x}$ is a test point, $W$ is a constant,
$c_i$ is the class associated with the $i$th sample and
$w_i$ is a weight calculated via a filter function:
\begin{equation}
w_i(\sigma) = f \left (\frac{\mathbf{x}-\mathbf{x_i}}{\sigma} \right )
\end{equation}
where $f$ is a filter function and
$\sigma$ is the width of the filter.
The parameter, $W$, is equivalent to the number of nearest
neighbours in a k-nearest-neighbours classifier and is held fixed by
varying the filter width. This keeps a uniform number of samples
within the central region of the filter.
An obvious choice for $f$ would be a Gaussian:
\begin{equation}
f(\mathbf{r})=\exp (-|\mathbf{r}|^2/2 )
\end{equation}
Where the upright brackets denote a metric, typically Cartesian.
The width
of the Gaussian can be effectively varied by simply squaring the weights,
thus they may be pre-calculated for an initial, trial value of $\sigma$.
Exponential interpolation is used to arrive at the final estimate.
The primary advantage of the above over a k-nearest-neighbours, is that
it generates estimates that are both continuous and differentiable.
Both features may be exploited, first to find the class
borders, then to perform classifications and estimate the
conditional probability. Let $R$ be
the difference in conditional probabilities between two classes:
\begin{equation}
R(\mathbf{x}) = P(2 | \mathbf{x}) - P(1 | \mathbf{x}) \approx \frac{1}{W}
\sum_i (2 c_i - 3) w_i
\end{equation}
Where $1$ and $2$ are the classes. The border between the two is
found by setting this expression to zero.
The procedure used was
to randomly pick pairs of points that straddle the class border
and then solve along the lines between them.
Analytical derivatives are used as an aid to root-finding:
\begin{equation}
\frac{\partial R}{\partial x_j} \approx \frac{1}{W \sigma^2}
\sum_i w_i} (2 c_i - 3) \left [x_{ij}-x_j - d_i^2 \frac{\sum_k
w_k (x_{kj}-x_k)} {\sum_k d_k^2 w_k} \right ]
\label{class_grad}
\end{equation}
The class of a test point is estimated as follows:
\begin{eqnarray}
j & = & \arg \underset{i}{\min} | \mathbf{x} - \mathbf{b_i} | \\
p & = & (\mathbf{x} - \mathbf{b_j}) \cdot \nabla_{\mathbf{x}} R (\mathbf{b_j})
\label{peq} \\
c & = & ( 3 + p/|p| ) / 2
\end{eqnarray}
where $\{\mathbf{b_i}\}$ sample the class border and $c$ is the retrieved class.
The value of $R$ may be extrapolated to the test point:
\begin{equation}
R \approx \tanh p
\label{confidence_est}
\end{equation}
%Equation (\ref{confidence_est}) is derived in Appendix B
%by considering two normally distributed classes of equal size.
This algorithm is robust, general and efficient,
yet still supplies knowledge of the
conditional probabilities needed to set a definite
confidence limit on the retrieved isoline. (Mills 2007)
\end{document}