Feature detection and tracking with an affine consistency check

The equations for detecting features, tracking them between consecutive frames, and checking for consistency using an affine transformation will be derived below using the inverse compositional approach. We will begin by deriving the equations for tracking, because this will yield some insight into which features would be good to track. The source code for this project is available for download at the end. Below is a video of this project in action.

Translation tracking
A translation model will be used to track the motion of features between consecutive frames. If we consider the warp below (a translation),
\begin{align}
\mathbf{w}(\mathbf{x},\mathbf{p}) &= \mathbf{x} + \mathbf{p} \\
\end{align}

we can find the inverse,
\begin{align}
\mathbf{x} &= \mathbf{w}(\mathbf{x},\mathbf{p}) - \mathbf{p} \\
\mathbf{w}(\mathbf{x},\mathbf{p})^{-1} &= \mathbf{x} - \mathbf{p} \\
\end{align}

and the composition,
\begin{align}
\mathbf{w}(\mathbf{x},\mathbf{p}) \circ \mathbf{w}(\mathbf{x},\mathbf{\delta p})^{-1} &= \mathbf{w}(\mathbf{w}(\mathbf{x},\mathbf{\delta p})^{-1}, \mathbf{p}) \\
&= \mathbf{w}(\mathbf{x}-\mathbf{\delta p}, \mathbf{p}) \\
&= \mathbf{x} - \mathbf{\delta p} + \mathbf{p} \\
\end{align}

Our iteration will seek $$\mathbf{\delta p}$$ and apply the update rule,
\begin{align}
\mathbf{p} - \mathbf{\delta p} \to \mathbf{p}
\end{align}

Affine consistency check
Once the translation model has been exploited to track features between consecutive frames, an affine model will be used to check the consistency of a feature between a current frame and the frame in which is was first detected. If we consider the warp below (an affine transformation),
\begin{align}
\mathbf{w}(\mathbf{x}, \mathbf{A}, \mathbf{b}) &= \mathbf{A}\mathbf{x}+\mathbf{b} \\
\end{align}

we can find the inverse of $$\mathbf{w}$$ provided $$\mathbf{A}$$ is invertible,
\begin{align}
\mathbf{A}\mathbf{x} &= \mathbf{w}(\mathbf{x}, \mathbf{A}, \mathbf{b}) - \mathbf{b} \\
\mathbf{x} &= \mathbf{A}^{-1}(\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) - \mathbf{b}) \\
&= \mathbf{A}^{-1} \mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) - \mathbf{A}^{-1}\mathbf{b} \\
\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b})^{-1} &= \mathbf{A}^{-1} \mathbf{x} - \mathbf{A}^{-1}\mathbf{b} \\
\end{align}

and the composition,
\begin{align}
\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) \circ \mathbf{w}(\mathbf{x},\mathbf{\delta A},\mathbf{\delta b})^{-1} &= \mathbf{w}(\mathbf{w}(\mathbf{x},\mathbf{\delta A},\mathbf{\delta b})^{-1},\mathbf{A},\mathbf{b}) \\
&= \mathbf{w}(\mathbf{\delta A}^{-1}\mathbf{x} - \mathbf{\delta A}^{-1}\mathbf{\delta b}, \mathbf{A}, \mathbf{b}) \\
&= \mathbf{A}(\mathbf{\delta A}^{-1}\mathbf{x} - \mathbf{\delta A}^{-1}\mathbf{\delta b}) + \mathbf{b} \\
&= \mathbf{A}\mathbf{\delta A}^{-1}\mathbf{x} - \mathbf{A}\mathbf{\delta A}^{-1}\mathbf{\delta b} + \mathbf{b} \\
\end{align}

We will seek $$\mathbf{\delta A}$$ and $$\mathbf{\delta b}$$ on each iteration and apply the update rules,
\begin{align}
\mathbf{A}\mathbf{\delta A}^{-1} &\to \mathbf{A} \label{update1} \\
\mathbf{b} - \mathbf{A}\mathbf{\delta A}^{-1}\mathbf{\delta b} &\to \mathbf{b} \label{update2} \\
\end{align}

In the translation model $$\mathbf{p}$$ was a vector such that the warp,
\begin{align}
\mathbf{w}(\mathbf{x},\mathbf{p}) &= \mathbf{x} + \begin{pmatrix}p_0 \\ p_1\end{pmatrix} \\
\end{align}

For the affine model if we let,
\begin{align}
\mathbf{A} &= \begin{pmatrix}1+p_0 && p_1 \\ p_2 && 1+p_3\end{pmatrix} \\
\mathbf{b} &= \begin{pmatrix}p_4 \\ p_5\end{pmatrix}\\
\mathbf{p} &= \begin{pmatrix}p_0 \\ p_1 \\ \vdots \\ p_5\end{pmatrix} \\
\end{align}

we can write $$\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b})$$ as,
\begin{align}
\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) &= \mathbf{w}(\mathbf{x},\mathbf{p}) = \begin{pmatrix}1+p_0 && p_1 \\ p_2 && 1+p_3\end{pmatrix}\mathbf{x} + \begin{pmatrix}p_4 \\ p_5\end{pmatrix}\\
\end{align}

So for the affine model we will ultimately seek the parameter vector, $$\mathbf{\delta p}$$, evaluate $$\mathbf{\delta A}$$ and $$\mathbf{\delta b}$$, and apply the update rules as given above in $$\eqref{update1}$$ and $$\eqref{update2}$$.

Minimizing the SSD
Our aim in tracking features is to find the $$\mathbf{p}$$ that minimizes the sum of squared distances between a feature's template image and the image of the feature in a later frame,
\begin{align}
\epsilon &= \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{\delta p})) - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right]^2 \label{ssd}\\
\end{align}

If we take the first order Taylor expansion of $$T(\mathbf{w}(\mathbf{x},\mathbf{p}))$$ about $$\mathbf{p}=\mathbf{0}$$,
\begin{align}
T(\mathbf{w}(\mathbf{x},\mathbf{p})) &= T(\mathbf{w}(\mathbf{x},\mathbf{p}))|_{\mathbf{p}=\mathbf{0}} + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}|_{\mathbf{p}=\mathbf{0}}\\
\end{align}

where $$\frac{\partial T}{\partial \mathbf{w}}$$ is the gradient of $$T$$ and $$\frac{\partial \mathbf{w}}{\partial \mathbf{p}}$$ is the Jacobian of the warp, we find,
\begin{align}
T(\mathbf{w}(\mathbf{x},\mathbf{\delta p})) &= T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} \\
\end{align}

Putting this result into $$\eqref{ssd}$$ we have,
\begin{align}
\epsilon &\approx \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right]^2 \\
\end{align}

In order to minimize the residual, $$\epsilon$$, we will take the derivative with respect to $$\mathbf{\delta p}$$, equate it to zero, and solve for $$\mathbf{\delta p}$$,
\begin{align}
\frac{\partial \epsilon}{\partial \mathbf{\delta p}} &= \sum_{\mathbf{x} \in \mathbf{R}} 2 \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\
0 &= \sum_{\mathbf{x} \in \mathbf{R}} 2 \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\
0 &= \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} + \sum_{\mathbf{x} \in \mathbf{R}} \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\\
-\sum_{\mathbf{x} \in \mathbf{R}} \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\
\sum_{\mathbf{x} \in \mathbf{R}} \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \sum_{\mathbf{x} \in \mathbf{R}} \left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\
\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} &= \sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \\
\end{align}
Lastly, solving for $$\mathbf{\delta p}$$,
\begin{align}
\mathbf{\delta p} &= \left(\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^{-1}\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \label{deltap1}\\
\end{align}

The Jacobian of the warps
Since we have two models, the translation and the affine, we have two Jacobians to analyze. The translation warp yields the Jacobian,
\begin{align}
\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \frac{\partial (\mathbf{x} + \mathbf{p})}{\partial \mathbf{p}} \\
&= \mathbf{I} \\
\end{align}

Plugging this into $$\eqref{deltap1}$$ we have,
\begin{align}
\mathbf{\delta p} &= \left(\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\right)^{-1}\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\right)^T\left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \\
\end{align}

For the affine warp we have,
\begin{align}
\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \frac{\partial (\mathbf{A}\mathbf{x} + \mathbf{b})}{\partial \mathbf{p}} \\
&= \frac{\partial}{\partial \mathbf{p}}\left[ \begin{pmatrix}1+p_0 && p_1 \\ p_2 && 1+p_3\end{pmatrix}\begin{pmatrix}x \\ y\end{pmatrix} \right] + \frac{\partial}{\partial \mathbf{p}} \begin{pmatrix}p_4 \\ p_5\end{pmatrix} \\
&= \frac{\partial}{\partial \mathbf{p}} \begin{pmatrix}x+p_0x+p_1y \\ p_2x+y+p_3y\end{pmatrix} + \frac{\partial}{\partial \mathbf{p}} \begin{pmatrix}p_4 \\ p_5\end{pmatrix} \\
&= \begin{pmatrix}x && y && 0 && 0 && 0 && 0 \\0 && 0 && x && y && 0 && 0\end{pmatrix} + \begin{pmatrix}0 && 0 && 0 && 0 && 1 && 0 \\ 0 && 0 && 0 && 0 && 0 && 1\end{pmatrix} \\
&= \begin{pmatrix}x && y && 0 && 0 && 1 && 0 \\0 && 0 && x && y && 0 && 1\end{pmatrix}
\end{align}

If we let,
\begin{align}
\frac{\partial T}{\partial \mathbf{w}} = \begin{pmatrix}T_x && T_y\end{pmatrix} \\
\end{align}

We have,
\begin{align}
\frac{\partial T}{\partial \mathbf{w}} \frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \begin{pmatrix}T_x && T_y\end{pmatrix} \begin{pmatrix}x && y && 0 && 0 && 1 && 0 \\0 && 0 && x && y && 0 && 1\end{pmatrix} \\
&= \begin{pmatrix}T_xx && T_xy && T_yx && T_yy && T_x && T_y\end{pmatrix} \\
\end{align}

Detecting features
Equation $$\eqref{deltap1}$$ gives an indication of which features would be good to track. The matrix,
\begin{align}
\mathbf{H} &= \sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\
\end{align}

should be well-conditioned. The heuristic for selecting features to track is to choose those features where the minimum eigenvalue of $$\mathbf{H}$$ is large [1]. Because we have used the inverse compositional approach and because the matrix, $$\mathbf{H}$$, does not depend on $$\mathbf{p}$$, the matrix, $$\mathbf{H}$$ (and therefore $$\mathbf{H}^{-1}$$), need only be computed once in relation to the affine model since the template image never changes. The translation model still requires the computation of $$\mathbf{H}^{-1}$$ at each step since the template image changes (we use the image of the feature in the previous image as the template).

Termination critera
Our iterations seek the $$\mathbf{p}$$ that minimizes the $$\epsilon$$ in $$\eqref{ssd}$$. If at the end of an iteration $$\mathbf{\delta p}$$ is below a specified threshold we can declare the feature tracked. However, if upon updating $$\mathbf{p}-\mathbf{\delta p} \to \mathbf{p}$$, the feature has gone beyond the bounds of our image, we declare the feature lost. If the determinant of $$\mathbf{H}$$ is below a given threshold, we declare the feature lost rather than attempt computation of the inverse. If $$\mathbf{\delta p}$$ has not dropped below the specified threshold after a maximum number of iterations, we declare the feature lost. Lastly, after tracking the feature to a location, $$\mathbf{p}$$, if the residue, $$\epsilon$$, exceeds a given threshold, we declare the feature lost because it no longer resembles the template.

Some final notes
The project download uses the Scharr operator to compute the image gradients. The project currently does not support large image displacements well. A pyramid implementation could be employed such that each image is scaled down, feature displacements are estimated, and the estimates are propagated back up. Lastly, the GPU could be utilized to improve performance. Have a look at the code, and let me know if you have any questions.