The code is available below.

Download this project: path_tracer_texture_mapping.tar.bz2

]]>**Translation tracking**

A translation model will be used to track the motion of features between consecutive frames. If we consider the warp below (a translation),

\begin{align}

\mathbf{w}(\mathbf{x},\mathbf{p}) &= \mathbf{x} + \mathbf{p} \\

\end{align}

we can find the inverse,

\begin{align}

\mathbf{x} &= \mathbf{w}(\mathbf{x},\mathbf{p}) - \mathbf{p} \\

\mathbf{w}(\mathbf{x},\mathbf{p})^{-1} &= \mathbf{x} - \mathbf{p} \\

\end{align}

and the composition,

\begin{align}

\mathbf{w}(\mathbf{x},\mathbf{p}) \circ \mathbf{w}(\mathbf{x},\mathbf{\delta p})^{-1} &= \mathbf{w}(\mathbf{w}(\mathbf{x},\mathbf{\delta p})^{-1}, \mathbf{p}) \\

&= \mathbf{w}(\mathbf{x}-\mathbf{\delta p}, \mathbf{p}) \\

&= \mathbf{x} - \mathbf{\delta p} + \mathbf{p} \\

\end{align}

Our iteration will seek \(\mathbf{\delta p}\) and apply the update rule,

\begin{align}

\mathbf{p} - \mathbf{\delta p} \to \mathbf{p}

\end{align}

**Affine consistency check**

Once the translation model has been exploited to track features between consecutive frames, an affine model will be used to check the consistency of a feature between a current frame and the frame in which is was first detected. If we consider the warp below (an affine transformation),

\begin{align}

\mathbf{w}(\mathbf{x}, \mathbf{A}, \mathbf{b}) &= \mathbf{A}\mathbf{x}+\mathbf{b} \\

\end{align}

we can find the inverse of \(\mathbf{w}\) provided \(\mathbf{A}\) is invertible,

\begin{align}

\mathbf{A}\mathbf{x} &= \mathbf{w}(\mathbf{x}, \mathbf{A}, \mathbf{b}) - \mathbf{b} \\

\mathbf{x} &= \mathbf{A}^{-1}(\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) - \mathbf{b}) \\

&= \mathbf{A}^{-1} \mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) - \mathbf{A}^{-1}\mathbf{b} \\

\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b})^{-1} &= \mathbf{A}^{-1} \mathbf{x} - \mathbf{A}^{-1}\mathbf{b} \\

\end{align}

and the composition,

\begin{align}

\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) \circ \mathbf{w}(\mathbf{x},\mathbf{\delta A},\mathbf{\delta b})^{-1} &= \mathbf{w}(\mathbf{w}(\mathbf{x},\mathbf{\delta A},\mathbf{\delta b})^{-1},\mathbf{A},\mathbf{b}) \\

&= \mathbf{w}(\mathbf{\delta A}^{-1}\mathbf{x} - \mathbf{\delta A}^{-1}\mathbf{\delta b}, \mathbf{A}, \mathbf{b}) \\

&= \mathbf{A}(\mathbf{\delta A}^{-1}\mathbf{x} - \mathbf{\delta A}^{-1}\mathbf{\delta b}) + \mathbf{b} \\

&= \mathbf{A}\mathbf{\delta A}^{-1}\mathbf{x} - \mathbf{A}\mathbf{\delta A}^{-1}\mathbf{\delta b} + \mathbf{b} \\

\end{align}

We will seek \(\mathbf{\delta A}\) and \(\mathbf{\delta b}\) on each iteration and apply the update rules,

\begin{align}

\mathbf{A}\mathbf{\delta A}^{-1} &\to \mathbf{A} \label{update1} \\

\mathbf{b} - \mathbf{A}\mathbf{\delta A}^{-1}\mathbf{\delta b} &\to \mathbf{b} \label{update2} \\

\end{align}

In the translation model \(\mathbf{p}\) was a vector such that the warp,

\begin{align}

\mathbf{w}(\mathbf{x},\mathbf{p}) &= \mathbf{x} + \begin{pmatrix}p_0 \\ p_1\end{pmatrix} \\

\end{align}

For the affine model if we let,

\begin{align}

\mathbf{A} &= \begin{pmatrix}1+p_0 && p_1 \\ p_2 && 1+p_3\end{pmatrix} \\

\mathbf{b} &= \begin{pmatrix}p_4 \\ p_5\end{pmatrix}\\

\mathbf{p} &= \begin{pmatrix}p_0 \\ p_1 \\ \vdots \\ p_5\end{pmatrix} \\

\end{align}

we can write \(\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b})\) as,

\begin{align}

\mathbf{w}(\mathbf{x},\mathbf{A},\mathbf{b}) &= \mathbf{w}(\mathbf{x},\mathbf{p}) = \begin{pmatrix}1+p_0 && p_1 \\ p_2 && 1+p_3\end{pmatrix}\mathbf{x} + \begin{pmatrix}p_4 \\ p_5\end{pmatrix}\\

\end{align}

So for the affine model we will ultimately seek the parameter vector, \(\mathbf{\delta p}\), evaluate \(\mathbf{\delta A}\) and \(\mathbf{\delta b}\), and apply the update rules as given above in \(\eqref{update1}\) and \(\eqref{update2}\).

**Minimizing the SSD**

Our aim in tracking features is to find the \(\mathbf{p}\) that minimizes the sum of squared distances between a feature's template image and the image of the feature in a later frame,

\begin{align}

\epsilon &= \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{\delta p})) - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right]^2 \label{ssd}\\

\end{align}

If we take the first order Taylor expansion of \(T(\mathbf{w}(\mathbf{x},\mathbf{p}))\) about \(\mathbf{p}=\mathbf{0}\),

\begin{align}

T(\mathbf{w}(\mathbf{x},\mathbf{p})) &= T(\mathbf{w}(\mathbf{x},\mathbf{p}))|_{\mathbf{p}=\mathbf{0}} + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}|_{\mathbf{p}=\mathbf{0}}\\

\end{align}

where \(\frac{\partial T}{\partial \mathbf{w}}\) is the gradient of \(T\) and \(\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\) is the Jacobian of the warp, we find,

\begin{align}

T(\mathbf{w}(\mathbf{x},\mathbf{\delta p})) &= T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} \\

\end{align}

Putting this result into \(\eqref{ssd}\) we have,

\begin{align}

\epsilon &\approx \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right]^2 \\

\end{align}

In order to minimize the residual, \(\epsilon\), we will take the derivative with respect to \(\mathbf{\delta p}\), equate it to zero, and solve for \(\mathbf{\delta p}\),

\begin{align}

\frac{\partial \epsilon}{\partial \mathbf{\delta p}} &= \sum_{\mathbf{x} \in \mathbf{R}} 2 \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\

0 &= \sum_{\mathbf{x} \in \mathbf{R}} 2 \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) + \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\

0 &= \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} + \sum_{\mathbf{x} \in \mathbf{R}} \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\\

-\sum_{\mathbf{x} \in \mathbf{R}} \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \sum_{\mathbf{x} \in \mathbf{R}} \left[T(\mathbf{w}(\mathbf{x},\mathbf{0})) - I(\mathbf{w}(\mathbf{x},\mathbf{p}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\

\sum_{\mathbf{x} \in \mathbf{R}} \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p}\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \sum_{\mathbf{x} \in \mathbf{R}} \left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\

\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\mathbf{\delta p} &= \sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \\

\end{align}

Lastly, solving for \(\mathbf{\delta p}\),

\begin{align}

\mathbf{\delta p} &= \left(\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^{-1}\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \label{deltap1}\\

\end{align}

**The Jacobian of the warps**

Since we have two models, the translation and the affine, we have two Jacobians to analyze. The translation warp yields the Jacobian,

\begin{align}

\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \frac{\partial (\mathbf{x} + \mathbf{p})}{\partial \mathbf{p}} \\

&= \mathbf{I} \\

\end{align}

Plugging this into \(\eqref{deltap1}\) we have,

\begin{align}

\mathbf{\delta p} &= \left(\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\right)^{-1}\sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\right)^T\left[I(\mathbf{w}(\mathbf{x},\mathbf{p})) - T(\mathbf{w}(\mathbf{x},\mathbf{0}))\right] \\

\end{align}

For the affine warp we have,

\begin{align}

\frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \frac{\partial (\mathbf{A}\mathbf{x} + \mathbf{b})}{\partial \mathbf{p}} \\

&= \frac{\partial}{\partial \mathbf{p}}\left[ \begin{pmatrix}1+p_0 && p_1 \\ p_2 && 1+p_3\end{pmatrix}\begin{pmatrix}x \\ y\end{pmatrix} \right] + \frac{\partial}{\partial \mathbf{p}} \begin{pmatrix}p_4 \\ p_5\end{pmatrix} \\

&= \frac{\partial}{\partial \mathbf{p}} \begin{pmatrix}x+p_0x+p_1y \\ p_2x+y+p_3y\end{pmatrix} + \frac{\partial}{\partial \mathbf{p}} \begin{pmatrix}p_4 \\ p_5\end{pmatrix} \\

&= \begin{pmatrix}x && y && 0 && 0 && 0 && 0 \\0 && 0 && x && y && 0 && 0\end{pmatrix} + \begin{pmatrix}0 && 0 && 0 && 0 && 1 && 0 \\ 0 && 0 && 0 && 0 && 0 && 1\end{pmatrix} \\

&= \begin{pmatrix}x && y && 0 && 0 && 1 && 0 \\0 && 0 && x && y && 0 && 1\end{pmatrix}

\end{align}

If we let,

\begin{align}

\frac{\partial T}{\partial \mathbf{w}} = \begin{pmatrix}T_x && T_y\end{pmatrix} \\

\end{align}

We have,

\begin{align}

\frac{\partial T}{\partial \mathbf{w}} \frac{\partial \mathbf{w}}{\partial \mathbf{p}} &= \begin{pmatrix}T_x && T_y\end{pmatrix} \begin{pmatrix}x && y && 0 && 0 && 1 && 0 \\0 && 0 && x && y && 0 && 1\end{pmatrix} \\

&= \begin{pmatrix}T_xx && T_xy && T_yx && T_yy && T_x && T_y\end{pmatrix} \\

\end{align}

**Detecting features**

Equation \(\eqref{deltap1}\) gives an indication of which features would be good to track. The matrix,

\begin{align}

\mathbf{H} &= \sum_{\mathbf{x} \in \mathbf{R}} \left(\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}}\right)^T\frac{\partial T}{\partial \mathbf{w}}\frac{\partial \mathbf{w}}{\partial \mathbf{p}} \\

\end{align}

should be well-conditioned. The heuristic for selecting features to track is to choose those features where the minimum eigenvalue of \(\mathbf{H}\) is large [1]. Because we have used the inverse compositional approach and because the matrix, \(\mathbf{H}\), does not depend on \(\mathbf{p}\), the matrix, \(\mathbf{H}\) (and therefore \(\mathbf{H}^{-1}\)), need only be computed once in relation to the affine model since the template image never changes. The translation model still requires the computation of \(\mathbf{H}^{-1}\) at each step since the template image changes (we use the image of the feature in the previous image as the template).

**Termination critera**

Our iterations seek the \(\mathbf{p}\) that minimizes the \(\epsilon\) in \(\eqref{ssd}\). If at the end of an iteration \(\mathbf{\delta p}\) is below a specified threshold we can declare the feature tracked. However, if upon updating \(\mathbf{p}-\mathbf{\delta p} \to \mathbf{p}\), the feature has gone beyond the bounds of our image, we declare the feature lost. If the determinant of \(\mathbf{H}\) is below a given threshold, we declare the feature lost rather than attempt computation of the inverse. If \(\mathbf{\delta p}\) has not dropped below the specified threshold after a maximum number of iterations, we declare the feature lost. Lastly, after tracking the feature to a location, \(\mathbf{p}\), if the residue, \(\epsilon\), exceeds a given threshold, we declare the feature lost because it no longer resembles the template.

**Some final notes**

The project download uses the Scharr operator to compute the image gradients. The project currently does not support large image displacements well. A pyramid implementation could be employed such that each image is scaled down, feature displacements are estimated, and the estimates are propagated back up. Lastly, the GPU could be utilized to improve performance. Have a look at the code, and let me know if you have any questions.

Download this project: features.tar.bz2

References:

1. J. Shi and C. Tomasi. *Good Features to Track.* Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, June 1994.

2. J. Kim, M. Hwangbo and T. Kanade. *Realtime affine-photometric KLT feature tracker on GPU in CUDA framework.* 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pages 886-893, 2009.

3. S. Baker and I. Matthews. *Lucas-Kanade 20 Years On: A Unifying Framework.* Int. J. Comput. Vision, vol. 56, no. 3, pages 221–255, 2004.

Below is a test render of the Sponza model. Some of the features of the original model have been stripped for this render, leaving approximately 150,000 triangles.

Below is a render of the Dragon model available from the Stanford 3D Scanning Repository. The rendered model contained 100,000 triangles.

**Hyperplane Separation Theorem**

The hyperplane separation theorem is a theorem about disjoint, convex sets. For our purposes we will be applying the theorem to a combination of an axis-aligned bounding box and a triangle in three dimensions. Since we will consider both the triangle and the axis-aligned bounding box to be compact (and convex), then, provided these two sets are disjoint, we can locate two parallel hyperplanes between them separated by a gap. We need only one separating hyperplane to conclude that the triangle and axis-aligned bounding box do not intersect. Below is an image depicting this theorem in two dimensions. The green line is a separating axis and the black line is a separating line.

In three dimensions, we will have a separating axis and a separating plane. We have a few different contact situations between objects to concern ourselves with: face to face contact, face to edge contact, and edge to edge contact. Thus, we have a list of potential separating axes including the normals to the faces and the cross products of the combinations of an edge from one object with an edge from the other. The face to edge contacts are handled by the face normals. The axis-aligned bounding box has six faces, but three sets of two parallel faces, so we have three potential separating axes. The triangle normal is a fourth. The axis-aligned bounding box has 12 edges, but three sets of four parallel edges, and the triangle has three edges, yielding nine cross products for a total of 13 potential separating axes. We need only complete all thirteen tests to conclude that the axis-aligned bounding box and the triangle are not disjoint. However, if any particular tests yields a separating plane, we need not complete the remaining tests to conclude that the objects are disjoint.

We will first translate the triangle and the axis-aligned bounding box such that the center of the box is located at the origin. Below is some code to detect a separation. We need not be concerned with the direction of the projections, but only their magnitudes. Because the box is located at the origin, we define a radius, `r`

, based on the half dimensions of the box.

bool hyperplaneSeparation(__vector n, __vector p0, __vector p1, __vector p2, double halfWidth, double halfHeight, double halfDepth) { double _p0 = n * p0, _p1 = n * p1, _p2 = n * p2; double min = MIN(_p0, MIN(_p1, _p2)), max = MAX(_p0, MAX(_p1, _p2)); double r = halfWidth * fabs(n.x) + halfHeight * fabs(n.y) + halfDepth * fabs(n.z); return -r > max || r < min; }

The `buildTree()`

function in the project download uses this method.

**kd tree construction using the surface area heuristic**

kd tree construction using the surface area heuristic is a greedy algorithm. During the build process, we will attempt to compare the cost of splitting a node with the cost of not splitting. If the local cost of splitting is less than not splitting, we split the node. Otherwise, we convert the current node to a leaf. The function we will use to estimate the cost is given below [2],

\begin{align}

C_V(p) &= K_T + K_I\left( \frac{SA(V_L)}{SA(V)} T_L + \frac{SA(V_R)}{SA(V)} T_R \right) \\

C_{NS} &= K_IT \\

\end{align}

where \(K_T\) is the cost of a traversal, \(K_I\) is the triangle intersection cost, \(SA(V_L), SA(V_R), SA(V)\) are the surface areas of the left node, right node, and current node, respectively, \(T_L, T_R, T\) are the number of triangles in the left node, right node, and current node, respectively, \(C_V(p)\) is the cost of splitting the current node, and \(C_{NS}\) is the cost of not splitting the current node.

We have \(6T\) potential split positions comprising each of the three axes with the minimum and maximum value for each axis from each triangle. The algorithm presented here is similar to the \(O(n \cdot \log^2n)\) algorithm described in [2]. For each axis we push the minimum triangle coordinate to a list with an event, `PRIMITIVE_START`

, and the maximum coordinate with an event, `PRIMITIVE_END`

. The lists are then sorted based on the coordinate value, \(O(n \cdot \log n)\). For each split position, we will consider the triangle to reside in both nodes, so for the first split position we have \(T_L=1\) and \(T_R=T\). As we progress to the next split position, if that event is a `PRIMITIVE_START`

, we increment \(T_L\). If the event is a `PRIMITIVE_END`

, we decrement \(T_R\) *on the following pass*, since that event corresponds to a triangle that we are including in both nodes (a vertex lies on the split plane). We now have \(T_L\) and \(T_R\) for each split position, we can evaluate the surface areas based on the split position, and throw some estimates in for \(K_T\) and \(K_I\). On each pass we evaluate \(C_V(p)\) and retain the best cost and split position, \(p\). Once we have processed all potential split positions, we compare the best cost with \(C_{NS}\) and split the node if \(C_V(p) \lt C_{NS}\).

The project download generates the kd tree recursively host-side, transfers it to a structure of arrays, and passes it to the device. The `buildKdTree()`

function in the download allows you to pass a type parameter. This parameter can be `KD_EVEN`

, splitting each node in the center resulting in a binary space partition, `KD_MEDIAN`

, splitting each node at the object median, or `KD_SAH`

, splitting the node using surface area heuristics. Below are three visualizations of the tree structure for each type.

**stack-based traversal**

To implement a stack-based traversal of the kd tree, we first created a stack object below. The `__stack_element`

contains an `id`

to reference a node, and the \(t_{min}\) and \(t_{max}\) values for a ray, \(\vec{r} = \vec{o} + \hat{d}t\), passing through the node.

struct __stack_element { int id; double tmin, tmax; }; class __stack { public: __stack_element stack[32]; int count; __device__ __stack(); __device__ void push(int id, double tmin, double tmax); __device__ __stack_element pop(); __device__ bool empty(); }; __device__ __stack::__stack() : count(0) {} __device__ void __stack::push(int id, double tmin, double tmax) { this->stack[count].id = id; this->stack[count].tmin = tmin; this->stack[count].tmax = tmax; count++; } __device__ __stack_element __stack::pop() { count--; __stack_element se; se.id = this->stack[count].id; se.tmin = this->stack[count].tmin; se.tmax = this->stack[count].tmax; return se; } __device__ bool __stack::empty() { return this->count == 0; }

With the stack object, it was fairly straightforward to implement the algorithm below. See [3] and [4] for details. The algorithm descends through the tree, pushing the further nodes onto the stack. With the nearest nodes evaluated first, we can break early upon finding an intersection within the bounds.

intersection = none; if (ray intersects root node) { stack.push(root node, tmin, tmax); while (!stack.empty() && !intersection) { (node, tmin, tmax) = stack.pop(); while (!node.isLeaf()) { tsplit = (node.split - ray.origin[node.axis]) / ray.direction[node.axis]; if (node.split - ray.origin[node.axis] >= 0) { first = node.left; second = node.right; } else { first = node.right; second = node.left; } if (tsplit >= tmax || tsplit < 0) node = first; else if (tsplit <= tmin) node = second; else { stack.push(second, tsplit, tmax); node = first; tmax = tsplit; } } foreach (triangle in node) if (ray intersects triangle) intersection = nearest intersection; if (nearest intersection > tmax) intersection = none; } }

Download the project and have a look at the code. Let me know if you have any thoughts.

Download this project: path_tracer.tar.bz2

References:

1. Akenine-Möller, Tomas. Fast 3D triangle-box overlap testing. *In ACM SIGGRAPH 2005 Courses*, ACM. Los Angeles, California. 2005.

2. Wald, Ingo, and Havran, Vlastimil. On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N). *IN PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON INTERACTIVE RAY TRACING*. 2006.

3. Wald, Ingo. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University.

4. Horn, Daniel Reiter, Sugerman, Jeremy, Houston, Mike, and Hanrahan, Pat. 2007. Interactive k-d tree GPU raytracing. *In Proceedings of the 2007 symposium on Interactive 3D graphics and games*, ACM. Seattle, Washington.

5. Havran, Vlastimil. 2000. Heuristic Ray Shooting Algorithms. Ph.D. Thesis, Czech Technical University in Prague.

**Thin lens**

We first reworked the camera model using the thin lens equation. Below, \(f\) is the focal length, \(d\) is the distance to the focal plane, and \(i\) is the distance to the image plane.

\begin{align}

\frac{1}{f} &= \frac{1}{d} + \frac{1}{i} \\

i &= \frac{1}{\frac{1}{f} - \frac{1}{d}} = \frac{fd}{d-f} \\

\end{align}

For a 50mm lens focused at 10m the image plane is located at approximately 5.025mm. For a lens set to f/8 this yields a radius, \(r\), of the entrance pupil of,

\begin{align}

r &= \frac{1}{2} \cdot \frac{f}{8} \\

&= \frac{1}{2} \cdot \frac{50mm}{8} &= 3.125mm \\

\end{align}

In the code we specify the focal length, aperture, and distance to the focal plane. From this we evaluate the distance to the image plane and the aperture size. The kernel is set to simulate a a 36mm wide sensor by a height evaluated appropriately for the given ratio. We fire rays from a location on the sensor through the origin to a point, \(\vec{p}\), on the focal plane. We then jitter the origin within the disc defined by the aperture radius. If the new offset is \(\vec{o}\), the ray direction is \(\vec{r}=\vec{p}-\vec{o}\), and we sample the ray \(\vec{o} + t\vec{r}\).

**Fresnel reflection**

Next, we added support for Fresnel reflection. This was a straightforward modification to our refractive material. We simply find the reflection coefficient, \(R\), for unpolarized light given below,

\begin{align}

R &= \frac{R_s+R_p}{2} \\

R_s &= \left( \frac{-n_1 \hat{r} \cdot \hat {n} - n_2 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]}}{-n_1 \hat{r} \cdot \hat {n} + n_2 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]}} \right)^2\\

R_p &= \left( \frac{n_1 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]} - -n_2 \hat{r} \cdot \hat {n}}{n_1 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]} + -n_2 \hat{r} \cdot \hat {n}} \right)^2\\

\end{align}

Note that all vectors above are unit vectors. We next generate a uniform random variable on the interval \([0,1]\) and reflect the ray if this number is less than \(R\). We refract and transmit otherwise.

**Smooth shading**

In this post we discussed triangle intersections, so we have \(s\) and \(t\) for our point of intersection, \(\vec{p}\),

\begin{align}

\vec{p} &= \vec{p}_0 + s(\vec{p}_1 - \vec{p}_0) + t(\vec{p}_2 - \vec{p}_0) \\

\end{align}

Provided we have a normal for each vertex, we can exploit the \(s\) and \(t\) evaluations and use them for interpolating the normals,

\begin{align}

\vec{n} &= \vec{n}_0 + s(\vec{n}_1 - \vec{n}_0) + t(\vec{n}_2 - \vec{n}_0) \\

\hat{n} &= \frac{\vec{n}}{\left|\left|\vec{n}\right|\right|}

\end{align}

**Texture mapping the plane primitive**

The last addition this time around was to add texture mapping support for the plane primitive. The general idea was to define two linearly-independent vectors that span the plane. With those two vectors and a point on the plane, we can find the \(s\) and \(t\) coordinates for our point of intersection as we do for the triangle primitive. Since the texture is repeating we find the coordinates, \(s'\) and \(t'\),

\begin{align}

s' &= s - \lfloor s \rfloor \\

t' &= t - \lfloor t \rfloor \\

\end{align}

We now have appropriate texture coordinates, \(s'\) and \(t'\), that both belong to the interval \([0,1]\). These are used as offsets into our texture.

Download this project: pathtracer_dof_triangles_fresnel_texture_smooth.tar.bz2

]]>We will continue with the project we left off with in this post. We will attempt to add triangles to our list of primitives. Once we are able to render triangles, this opens the door to rendering full scale models. However, because models will contain upwards of thousands of triangles, we need to be able to organize those primitives effectively for intersection tests. For this we have implemented a rudimentary binary space partitioning. We will discuss towards the end what could be done to improve efficiency. Below are two renders.

In the post, A calibration method based on barycentric coordinates for multi-touch systems, we discussed barycentric coordinates. That concept will be used here for our triangle intersection tests. Our first job is to locate the point, \(\vec{p}\), where the ray intersects the plane in which the triangle lies (this was discussed a bit here). If a triangle is defined by the vertices, \(\vec{p}_0\), \(\vec{p}_1\), and \(\vec{p}_2\), the triangle normal can be given as \(\vec{n} = (\vec{p}_1-\vec{p}_0)\times(\vec{p}_2-\vec{p}_0)\). Once we have found the point, \(\vec{p}\), we evaluate the barycentric coordinates, \(s\) and \(t\), of the point relative to the triangle. These equations are given below, where \(\vec{v}_0 = \vec{p}_1-\vec{p}_0\) and \(\vec{v}_1 = \vec{p}_2 - \vec{p}_0\).

\begin{align}

s &= \frac{(\vec p \cdot \vec v_0)(\vec v_1 \cdot \vec v_1)-(\vec p \cdot \vec v_1)(\vec v_0 \cdot \vec v_1)}{(\vec v_0 \cdot \vec v_0)(\vec v_1 \cdot \vec v_1)-(\vec v_0 \cdot \vec v_1)^2}\\

t &= \frac{(\vec p \cdot \vec v_1)(\vec v_0 \cdot \vec v_0)-(\vec p \cdot \vec v_0)(\vec v_0 \cdot \vec v_1)}{(\vec v_0 \cdot \vec v_0)(\vec v_1 \cdot \vec v_1)-(\vec v_0 \cdot \vec v_1)^2}

\end{align}

Provided \(s\geq0\), \(t\geq0\), and \(s+t\leq1\), we can conclude that the point, \(\vec{p}\), lies inside the triangle. We can then reflect or transmit the ray appropriately depending on the material type.

This addendum to the path tracer project was relatively straight forward, but it does not scale well. For each ray we must find the nearest intersection, and for \(n\) primitives this amounts to \(n\) intersection tests on each ray bounce. We cannot afford to check each primitive on models containing thousands of primitives, so we have added a basic binary space partitioning.

The partitioning tree is generated host-side and transferred to the device. For this we have elected to represent our tree structure as a structure of arrays. Below is the structure as it stands. `depth`

represents the depth of a specific node, `minx`

, `miny`

, ... `maxz`

represent the bounds of the node, `child0`

and `child1`

represent the array index of the two child nodes, `parent`

holds the index of the parent node, `id`

is the index of the node, and `leaf_id`

is a separate indexing that applies only to the leaf nodes. The `leaf_id`

gives us an offset into the `objects`

array which, itself, applies only to the leaf nodes. `n_objects`

applies to all nodes and represents the number of objects that pass through a node. Lastly, `max_depth`

holds the depth of our tree, `size`

is the number of nodes, and `leaf_size`

is the number of leaf nodes.

struct _bounding_box { unsigned short *depth, *depth_device; double *minx, *miny, *minz, *maxx, *maxy, *maxz, *minx_device, *miny_device, *minz_device, *maxx_device, *maxy_device, *maxz_device; short *child0, *child1, *child0_device, *child1_device; short *parent, *parent_device; short *id, *id_device; short *leaf_id, *leaf_id_device; unsigned short *n_objects, *n_objects_device; unsigned short *objects, *objects_device; unsigned short max_depth; unsigned short size, leaf_size; };

If we have a tree with depth 3, then \(\text{size} = 2^{(3+1)}-1 = 15\) and \(\text{child_size} = 2^{3} = 8\). Thus, we would have \(15\) nodes in total and \(8\) leaf nodes.

The idea was to first evaluate (after the camera transformation) the minimum and maximum axes values of the axis-aligned box that bounds every primitive in our scene. These values are passed to our tree-building function, and the tree is generated by splitting along the major axis. If the dimensions of our root node are \((1,2,3)\), we would first split along the \(z\)-axis resulting in two children of size \((1,2,1.5)\). The second splits would occur along the \(y\)-axis resulting in 4 nodes of size \((1,1,1.5)\).

Once we reach a leaf node, we cycle through all of the primitives in our scene seeking those primitives that pass through the leaf node. Once the tree is built and all leaf nodes have been processed, we propagate back the number of objects in each child node to its parent. In the code we have also merged the objects from child nodes to the current node if the number of objects is below a certain threshold. There would be no sense in testing 16 child nodes if they all contain the same primitive.

When testing child nodes for the containment of primitives, we have cheated a bit. For one we have not added any plane primitives to the partitioning. We simply add these primitives to the list of objects we test for intersections. For the sphere primitive we have evaluated the radius of the bounding sphere of the given tree node and compared it with the radius of the primitive. If the distance between the sphere center and the box center is less than the sum of the radii, we include the primitive as passing through the tree node. Consequently, this will include spheres that should not necessarily belong to the node, but it will include all the spheres that should. Lastly, when testing for the containment of triangle primitives in a given tree node, we evaluate the axis-aligned bounding box of the primitive and test for overlap between the two bounding boxes. Again, this will potentially include many more primitives than it should but will capture all that is necessary.

Our ray sampling procedure has been updated to query for intersections with the bounding box. If we find the ray hits the root node, we then query the two child nodes. If the ray hits a child node, we check the children of that child node. We continue like this until we reach a leaf node. Upon reaching a leaf node, we add the objects contained in that leaf node to the list of objects to test against for intersections. Below is the function for testing whether a ray intersects an axis-aligned bounding box. There are a few cases. If the node does not contain any primitives, there is no point in testing any furthur (no children will contain any primitives either). Additionally, the ray could originate inside the bounding box, and, lastly, we check for intersection with the left, right, bottom, top, rear, and front box faces.

__device__ bool rayIntersects_device(_bounding_box& b, unsigned short index, __ray r) { // contains objects if (b.n_objects_device[index] < 1) return false; // containment of ray origin if (r.origin.x >= b.minx_device[index] && r.origin.x <= b.maxx_device[index] && r.origin.y >= b.miny_device[index] && r.origin.y <= b.maxy_device[index] && r.origin.z >= b.minz_device[index] && r.origin.z <= b.maxz_device[index]) return true; // intsection tests if (r.origin.x < b.minx_device[index] && r.direction.x > 0) { // check left face intersection double t = (-b.minx_device[index] + r.origin.x) / -r.direction.x; //double x = r.origin.x + t * r.direction.x; double y = r.origin.y + t * r.direction.y; double z = r.origin.z + t * r.direction.z; if (y >= b.miny_device[index] && y <= b.maxy_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true; } if (r.origin.x > b.maxx_device[index] && r.direction.x < 0) { // check right face intersection double t = (b.maxx_device[index] - r.origin.x) / r.direction.x; //double x = r.origin.x + t * r.direction.x; double y = r.origin.y + t * r.direction.y; double z = r.origin.z + t * r.direction.z; if (y >= b.miny_device[index] && y <= b.maxy_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true; } if (r.origin.y < b.miny_device[index] && r.direction.y > 0) { // check bottom face intersection double t = (-b.miny_device[index] + r.origin.y) / -r.direction.y; double x = r.origin.x + t * r.direction.x; //double y = r.origin.y + t * r.direction.y; double z = r.origin.z + t * r.direction.z; if (x >= b.minx_device[index] && x <= b.maxx_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true; } if (r.origin.y > b.maxy_device[index] && r.direction.y < 0) { // check top face intersection double t = (b.maxy_device[index] - r.origin.y) / r.direction.y; double x = r.origin.x + t * r.direction.x; //double y = r.origin.y + t * r.direction.y; double z = r.origin.z + t * r.direction.z; if (x >= b.minx_device[index] && x <= b.maxx_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true; } if (r.origin.z < b.minz_device[index] && r.direction.z > 0) { // check rear face intersection double t = (-b.minz_device[index] + r.origin.z) / -r.direction.z; double x = r.origin.x + t * r.direction.x; double y = r.origin.y + t * r.direction.y; //double z = r.origin.z + t * r.direction.z; if (x >= b.minx_device[index] && x <= b.maxx_device[index] && y >= b.miny_device[index] && y <= b.maxy_device[index]) return true; } if (r.origin.z > b.maxz_device[index] && r.direction.z < 0) { // check front face intersection double t = (b.maxz_device[index] - r.origin.z) / r.direction.z; double x = r.origin.x + t * r.direction.x; double y = r.origin.y + t * r.direction.y; //double z = r.origin.z + t * r.direction.z; if (x >= b.minx_device[index] && x <= b.maxx_device[index] && y >= b.miny_device[index] && y <= b.maxy_device[index]) return true; } // no intersection return false; }

Below is the function that adds primitives to the hit list. These are primitives we must check directly for intersections. It was an attempt to avoid recursion and is fairly crude. It starts by testing the root node and continues to add indices on a bounding box hit. When a leaf node is reached, we add only those primitives that have not already been added.

__device__ short intersects_device(_bounding_box& b, int i, __ray r, short hit_list[]) { int index = 0, count = 1, indices[30000]; indices[index] = 0; short hit_count = 0; bool found = false; while (index < count && index < 30000) { i = indices[index++]; if (rayIntersects_device(b, i, r)) { if (b.depth_device[i] == b.max_depth) { for (int j = 0; j < b.n_objects_device[i]; j++) { short hit = b.objects_device[ b.leaf_id_device[i] * 10000 + j ]; found = false; for (int l = 0; l < hit_count; l++) { if (hit_list[l] == hit) { found = true; break; } } if (!found) hit_list[hit_count++] = hit; } } else { indices[count++] = b.child0_device[i]; indices[count++] = b.child1_device[i]; } } }; return hit_count; }

The `sampleRay`

function has been updated to use the `intersects_device`

method. It now loops over only those primitives that should be tested directly. Since we are handling planes directly, the project expects those planes to be added to the objects list first. The `sampleRay`

has a second loop for handling planes. Once a primitive other than a plane is found, it breaks from the loop.

Occasionally during testing, the kernel would timeout. The number of rays each kernel call is forced to handle has been reduced to help prevent this from occurring. A kernel call now handles a 2 by 2 grid of blocks sized 16 by 16. Thus, at the moment the kernel only handles 1024 pixels on each pass. We send an offset in both the \(x\) and \(y\)-directions to update the entire image over successive loops.

Blender was used to export 3D models in OBJ format. The project expects triangles and normals to be present in the OBJ file. When exporting do not forget to check "Include Normals" and "Triangulate Faces".

This project is fairly crude. Below is a list of some ideas that could be implemented to improve the efficiency of the project.

- kd-tree
- improved intersection testing
- ray-triangle intersections
- containment testing for spheres in nodes
- containment testing for triangles in nodes
- shared memory
- generating tree structure on device

Have a look at the project, and let me know if you have any questions or suggestions.

Download this project: pathtracer_dof_triangles.tar.bz2

]]>Essentially, we will define the distance to the focal plane and a blur radius. For each primary ray we find its intersection with the focal plane, \(\vec{p}\), and jitter the ray origin by an amount, \(\vec{d}\). We then define the new ray direction as \(\vec{r}=\vec{p}-\vec{d}\). Consequently, objects on the focal plane will appear in focus. Below is the addendum to the `kernel()`

function.

__vector dir = __vector(x - width / 2, -y + height / 2, 0 + width) + offset; __ray ray = { __vector(0, 0, 0), dir.unit() }; u1 = rand_device[i*width*height*3+index+1]; u2 = rand_device[i*width*height*3+index+2]; r1 = 2 * M_PI * u1; r2 = u2; offset = __vector(cos(r1)*r2, sin(r1)*r2, 0.0) * blur_radius; __vector p = ray.origin + dir * (focal_distance / width); ray.origin = ray.origin + offset; ray.direction = (p - ray.origin).unit();

Again, don't forget to update the `Makefile`

to reference the proper locations for the `libcudart.so`

and `libcurand.so`

libraries.

Download the updated project: pathtracer_dof.tar.bz2

]]>Below are two screen captures of this project in action.

This path tracer is basic and fairly crude and inefficient. I'll provide a brief overview of the code before we delve into some of the mathematics. The host code defines an abstract base class, `cObject`

, from which the `cPlane`

and `cSphere`

objects are derived. The base class includes the material type, color, emission color and type (plane or sphere) properties. The `applyCamera()`

virtual function is defined in the super classes and transforms the respective object into camera space.

The objects in camera space are passed to the device where the environment is rendered. The `runPathTracer()`

function in `pathtracer.cu`

generates some random numbers, executes the kernel, and retrieves the current frame. This frame is rendered to a texture during program execution and saved to a ppm file upon program termination.

The kernel function runs through our buffer, and for each buffer location four rays are shot out in each of the four quadrants surrounding the buffer location using cosine-weighted sampling. These four samples are averaged and added to the accumulation. The device function, `sampleRay()`

, is called on each ray. A maximum loop size is defined (e.g. 5 bounces), and sampling begins for the current ray.

The ray sampler loops over the maximum number of bounces. Within this loop, we loop over our objects seeking an intersection using the equations outlined below for spheres and planes. If an intersection is found (the nearest intersection), we set the values in our emission and color arrays and bounce the ray according to the material type (diffuse, specular, or refractive). Lastly, we apply the emission and color arrays to our final sample. If our final sample is \(\vec{s}_1\) and the emission and color values are \(\vec{e}_n\) and \(\vec{c}_n\), respectively, for \(n \in {1,2,...,m}\), where \(m\) is the bounce limit, the result would be,

\begin{align}

\vec{s}_{m} &= \vec{e}_m\\

\vec{s}_{n} &= \vec{e}_{n} + \vec{c}_{n} \circ \vec{s}_{n+1}\\

\end{align}

Below we will discuss some of the mathematics involved in the process before we mention interaction and conclude with a few notes.

**Sphere intersection**

Our path tracer will include support for spheres and planes. Below we have the equation for a sphere and a ray followed by the evaluation of the point of intersection, \(\vec{p}\). We have a point of intersection provided the determinant of the quadratic equation is positive. Lastly, we evaluate the surface normal by subtracting the sphere center from the point of intersection. Note that when we evaluate the roots of the quadratic, we will select the lesser of the two roots (the nearest point of intersection).

\begin{align}

(\vec{p} - \vec{c}) \cdot (\vec{p} - \vec{c}) &= r^2\\

\vec{r}(t) &= \vec{o} + \vec{r}t\\

(\vec{o} + \vec{r}t - \vec{c}) \cdot (\vec{o} + \vec{r}t - \vec{c}) &= r^2\\

(\vec{r}\cdot\vec{r})t^2 + 2 \cdot \vec{r} (\vec{o} - \vec{c}) t + (\vec{o} - \vec{c}) \cdot (\vec{o} - \vec{c}) - r^2 &= 0\\

\vec{n} &= \vec{p} - \vec{c}\\

\end{align}

**Plane intersection**

Below we have the equation for a plane followed by an evaluation of the point of intersection. Note that if the ray is parallel to the plane we have either no intersection or an unlimited number of intersections (the line is in the plane). Here we do not need to evaluate the normal, it is an inherent property of the plane.

\begin{align}

(\vec{p} - \vec{p}_0) \cdot \hat{n} &= 0\\

\vec{r}(t) &= \vec{o} + \vec{r}t\\

(\vec{o} + \vec{r}t - \vec{p}_0) \cdot \vec{n} &= 0\\

\end{align}

**Specular reflection**

The simplest of the three lighting models we will implement in this project, specular reflection gives objects a mirror-like quality. Incoming rays are reflected off the surface of an object in a direction uniquely defined by the incoming ray, \(\vec{r}\), and the unit vector normal to the surface at the point of intersection, \(\hat{n}\).

\begin{align}

\vec{t} &= 2(\hat{n}\cdot\vec{r})\hat{n} - \vec{r}\\

\end{align}

**Diffuse reflection**

To implement diffuse reflections we will use cosine-weighted sampling. More information on cosine-weighted sampling can be found here. Below \(u_1\) and \(u_2\) are uniform random variables. Ultimately, we will reorient the resultant vector based on the surface normal (we are sampling from the unit hemisphere defined by the surface normal at the point of intersection).

\begin{align}

u_1 &\sim U(0,1)\\

u_2 &\sim U(0,1)\\

r &= \sqrt{1-u_1}\\

\theta &= 2\pi u_2\\

\vec{v} &=

\begin{pmatrix}

r \cos(\theta)\\

r \sin(\theta)\\

\sqrt{u_1}\\

\end{pmatrix}

\end{align}

**Refaction**

Refraction gives the appearance of light traveling through a barrier, such as from air to glass. Below we have the equation for the transmission vector, \(\vec{t}\), based on Snell's equations. \(n_1\) and \(n_2\) are the indices of refraction of the two media. Obviously, this equation is only valid if the quantity under the radical is nonnegative. If this quantity is negative, we use the reflection equation above. Such a situation is known as total internal reflection. In our code we will intialize \(n_1\) and \(n_2\) by evaluating the inner product of the ray with the surface normal. If this product is less than zero, we will be entering the medium. We also flip the normal when exiting the medium. It should be relatively straight forward to add the Fresnel equations. Kevin Beason did so here.

\begin{align}

\vec{t} &= \frac{n_1}{n_2}\hat{r} - \left( \frac{n_1}{n_2} \hat{n}\cdot\hat{r} + \sqrt{1-\frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]} \right) \cdot \hat{n}\\

\end{align}

**A spice of interaction**

We have attempted to add some interaction to this project by including the keyboard handler available here. The premise behind this procedure is to reset the accumulated path values when the camera position or orientation changes. The path tracer begins to progressively refine the scene when the view remains static. Improvements to the project's efficiency would yield a better interactive experience.

**Some notes**

The larger the surface area of your light sources, the faster your scene will appear to converge (less noise), because the rays will hit a light source with greater probability. The project currently has a limit of 10 bounces. If you wish to exceed this limit, you must update the `sampleRay()`

function in `pathtracer.cu`

. Additionally, you will need to update the `Makefile`

to reference the proper location for the `libcudart.so`

and `libcurand.so`

libraries.

If you have any suggestions for improving this path tracer or questions about it, let me know.

Download this project: pathtracer.tar.bz2

Additional information:

]]>The `cMatrix::householderBidiagonalization()`

method:

void cMatrix::householderBidiagonalization(cMatrix& Q, cMatrix& R, cMatrix& S) { double mag, alpha; cMatrix u(m, 1), v(m, 1), u_(n, 1), v_(n, 1); cMatrix P(m, m), I(m, m), P_(n, n), I_(n, n); Q = cMatrix(m, m); R = *this; S = cMatrix(n, n); for (int i = 0; i < n; i++) { u.zero(); v.zero(); mag = 0.0; for (int j = i; j < m; j++) { u.A[j] = R.A[j * n + i]; mag += u.A[j] * u.A[j]; } mag = sqrt(mag); alpha = u.A[i] < 0 ? mag : -mag; mag = 0.0; for (int j = i; j < m; j++) { v.A[j] = j == i ? u.A[j] + alpha : u.A[j]; mag += v.A[j] * v.A[j]; } mag = sqrt(mag); if (mag > 0.0000000001) { for (int j = i; j < m; j++) v.A[j] /= mag; P = I - (v * v.transpose()) * 2.0; R = P * R; Q = Q * P; } ///////////////////////// u_.zero(); v_.zero(); mag = 0.0; for (int j = i + 1; j < n; j++) { u_.A[j] = R.A[i * n + j]; mag += u_.A[j] * u_.A[j]; } mag = sqrt(mag); alpha = u_.A[i + 1] < 0 ? mag : -mag; mag = 0.0; for (int j = i + 1; j < n; j++) { v_.A[j] = j == i + 1 ? u_.A[j] + alpha : u_.A[j]; mag += v_.A[j] * v_.A[j]; } mag = sqrt(mag); if (mag > 0.0000000001) { for (int j = i + 1; j < n; j++) v_.A[j] /= mag; P_ = I_ - (v_ * v_.transpose()) * 2.0; R = R * P_; S = P_ * S; } } }

Download the source: qr_householder_bidiagonalization.cc.bz2

]]>Theorem. | A real matrix, \(\mathbf{A} \in \mathbf{M}_{m,n}\), can be decomposed as \(\mathbf{A}=\mathbf{Q}\mathbf{R}\), where \(\mathbf{Q} \in \mathbf{M}_{m,m}\) is an orthogonal matrix and \(\mathbf{R} \in \mathbf{M}_{m,n}\) is an upper triangular matrix. |

The proof of this theorem has been omitted but could be constructed using Householder transformations. We'll discuss the Householder transformation and see how it can be applied to perform the QR decomposition. We'll review the decomposition algorithm and, lastly, have a look at some C++ code.

A Householder transformation is a linear transformation given by the matrix, \(\mathbf{P}\),

\begin{align}

\mathbf{P} &= \mathbf{I} - 2\hat{v}\hat{v}^T\\

\end{align}

where \(\mathbf{I}\) is the identity matrix and \(\hat{v}\) is a unit vector. \(\mathbf{P}\vec{x}\) is the reflection of \(\vec{x}\) about the hyperplane passing through the origin with normal vector, \(\hat{v}\).

\begin{align}

\mathbf{P}\vec{x} &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\vec{x}\\

&= \vec{x} - 2\hat{v}\hat{v}^T\vec{x}\\

\end{align}

If \(\hat{v}\hat{v}^T\vec{x}\) is the projection of \(\vec{x}\) onto \(\hat{v}\), then \(\mathbf{P}\vec{x}\) should reflect \(\vec{x}\) about the hyperplane with normal, \(\hat{v}\). See the diagram below.

Some properties of \(\mathbf{P}\) follow.

\begin{align}

\mathbf{P}^T &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)^T\\

&= \mathbf{I} - 2\left(\hat{v}\hat{v}^T\right)^T\\

&= \mathbf{I} - 2\hat{v}\hat{v}^T\\

&= \mathbf{P}

\end{align}

\(\mathbf{P}^T = \mathbf{P}\), thus, \(\mathbf{P}\) is symmetric.

\begin{align}

\mathbf{P}^T\mathbf{P} &= \mathbf{P}\mathbf{P}\\

&= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\\

&= \mathbf{I} - 4\hat{v}\hat{v}^T + 4\hat{v}\hat{v}^T\hat{v}\hat{v}^T\\

&= \mathbf{I} - 4\hat{v}\hat{v}^T + 4\hat{v}\left(\hat{v}^T\hat{v}\right)\hat{v}^T\\

&= \mathbf{I} - 4\hat{v}\hat{v}^T + 4\hat{v}\hat{v}^T\\

&= \mathbf{I}

\end{align}

\(\mathbf{P}^T\mathbf{P} = \mathbf{I} \Leftrightarrow \mathbf{P}^T = \mathbf{P}^{-1}\), thus, \(\mathbf{P}\) is orthogonal.

\(\mathbf{P}^T\mathbf{P} = \mathbf{P}^2 = \mathbf{I}\), thus, \(\mathbf{P}\) is an involution.

Additionally,

\begin{align}

\mathbf{P}\vec{v} &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\vec{v}\\

&= \vec{v} - 2\hat{v}\hat{v}^T\hat{v}\|\vec{v}\|\\

&= \vec{v} - 2\vec{v}\\

&= -\vec{v}\\

\end{align}

thus, \(-1\) is an eigenvalue of \(\mathbf{P}\) with multiplicity one.

For a vector, \(\vec{u}\), orthogonal to \(\vec{v}\), we have,

\begin{align}

\mathbf{P}\vec{u} &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\vec{u}\\

&= \vec{u} - \vec{0}\\

&= \vec{u}\\

\end{align}

thus, \(1\) is an eigenvalue of \(\mathbf{P}\) with multiplicity \(m-1\), since there are \(m-1\) vectors orthogonal to \(\vec{v}\).

Now that we have touched on the Householder transformation, we'll see how the transformation can be applied to decompose a matrix into the product of an orthogonal matrix and an upper triangular matrix. The basic idea is to apply multiple transformations to successively zap the entries below the main diagonal column by column. We will then have something of the form,

\begin{align}

\mathbf{A} &= \left(\mathbf{Q}_1\mathbf{Q}_2\ldots\mathbf{Q}_n\right)\left(\mathbf{Q}_n\mathbf{Q}_{n-1}\ldots\mathbf{Q}_1\mathbf{A}\right)\\

\end{align}

Clearly, from the properties above, \(\mathbf{Q}_i\mathbf{Q}_i = \mathbf{I}\), so this is a valid statement. Additionally, \(\mathbf{Q}_1\mathbf{Q}_2\ldots\mathbf{Q}_n\) is the product of orthogonal matrices and is itself orthogonal. What is maybe not so clear at this point is that the product, \(\mathbf{Q}_n\mathbf{Q}_{n-1}\ldots\mathbf{Q}_1\mathbf{A}\), is an upper triangular matrix.

Consider \(\mathbf{A}\) as,

\begin{align}

\mathbf{A} &= \begin{pmatrix}

\vec{a_1} & \vec{a_2} & \cdots & \vec{a_n}\\

\end{pmatrix}\\

\end{align}

The first step is to find the Householder transformation, \(\mathbf{Q}_1 = \mathbf{I} - 2\hat{v_1}\hat{v_1}^T\), that rotates the vector, \(\vec{a_1}\), to the vector, \(\|\vec{a_1}\|\hat{e_1} = \begin{pmatrix}\|\vec{a_1}\| & 0 & \cdots & 0\end{pmatrix}^T\), which amounts to finding the appropriate unit vector, \(\hat{v_1}\). Note that we could also rotate the vector, \(\vec{a_1}\), to the vector, \(-\|\vec{a_1}\|\hat{e_1} = \begin{pmatrix}-\|\vec{a_1}\| & 0 & \cdots & 0\end{pmatrix}^T\). In our implementation we will choose the sign to improve numerical stability. For the vector, \(\vec{a_1}\), we will choose the sign to be \(-\text{sgn}(\hat{e_1}^T\vec{a_1})\).

If we choose \(\vec{v_1}\) such that,

\begin{align}

\vec{v_1} &= \frac{\vec{a_1}-\|\vec{a_1}\|\hat{e_1}}{\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|}\\

\end{align}

where \(\hat{e_1}\) is a standard basis vector, then we can show that \(\mathbf{Q}_1\vec{a_1} = \|\vec{a_1}\|\hat{e_1}\) as follows. Working with the denominator,

\begin{align}

\mathbf{Q}_1\vec{a_1} &= \left[\mathbf{I} - 2\left(\frac{\vec{a_1}-\|\vec{a_1}\|\hat{e_1}}{\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|}\right)\left(\frac{\vec{a_1}-\|\vec{a_1}\|\hat{e_1}}{\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|}\right)^T\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{\left(\vec{a_1} - \|\vec{a_1}\|\hat{e_1}\right)^T\left(\vec{a_1} - \|\vec{a_1}\|\hat{e_1}\right)}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{\left(\vec{a_1}^T - \|\vec{a_1}\|\hat{e_1}^T\right)\left(\vec{a_1} - \|\vec{a_1}\|\hat{e_1}\right)}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{\vec{a_1}^T\vec{a_1} - 2\|\vec{a_1}\|\hat{e_1}^T\vec{a_1} + \|\vec{a_1}\|^2\hat{e_1}^T\hat{e_1}}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\|\vec{a_1}\|^2 - 2\|\vec{a_1}\|\hat{e_1}^T\vec{a_1}}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\left(\|\vec{a_1}\|^2-\|\vec{a_1}\|\hat{e_1}^T\vec{a_1}\right)}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\left(\vec{a_1}^T\vec{a_1} - \|\vec{a_1}\|\hat{e_1}^T\vec{a_1}\right)}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\left(\vec{a_1}^T - \|\vec{a_1}\|\hat{e_1}^T\right)\vec{a_1}}\right]\vec{a_1}\\

&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}\right]\vec{a_1}\\

\end{align}

Finishing up with the numerator,

\begin{align}

\mathbf{Q}_1\vec{a_1} &= \vec{a_1} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T\vec{a_1}}{2\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}\\

&= \vec{a_1} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}{2\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}\\

&= \vec{a_1} - \left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\\

&= \|\vec{a_1}\|\hat{e_1}\\

\end{align}

Multiplying \(\mathbf{A}\\) by \(\mathbf{Q}_1\) yields a matrix of the following form.

\begin{align}

\mathbf{Q}_1\mathbf{A} &= \begin{pmatrix}\|\vec{a_1}\| & \star & \cdots & \star\\0 \\\vdots & & \mathbf{A_1}\\0\end{pmatrix}\\

\end{align}

Our job now is to find the matrix, \(\mathbf{Q}_2\), to deal with the sub-matrix \(\mathbf{A_1}\). Note that \(\mathbf{Q}_2\) will have the following form.

\begin{align}

\mathbf{Q}_2 &= \begin{pmatrix}

1 & 0 & \cdots & 0\\

0\\

\vdots & & \mathbf{I} - 2\hat{v_2}\hat{v_2}^T \\

0\\

\end{pmatrix}\\

\end{align}

We proceed in this fashion until all entries below the main diagonal are zero. Thus, \(\mathbf{Q}_n\mathbf{Q}_{n-1}\ldots\mathbf{Q}_1\mathbf{A}\) will be an upper triangular matrix.

We can now write out an algorithm for performing the QR decomposition using Householder transformations. This algorithm is rather straightforward but naive in the sense that a more efficient algorithm could be written.

input: A (m x n matrix) output: Q (m x m matrix), R (m x n matrix) u = m-dimensional vector v = m-dimensional vector I = identity matrix P = I Q = I R = A for i = 1 to n u.zero() v.zero() for j = i to m u[j] = R[j][i] alpha = u[i] < 0 ? u.length() : -u.length() for j = 1 to m v[j] = j == i ? u[j] + alpha : u[j] if (v.length() < epsilon) continue v.normalize() P = I - 2 * v * v.transpose() R = P * R Q = Q * P end for

Below we have a C++ implementation of the decomposition algorithm above. The `householderDecomposition`

method is a member of the `cMatrix`

object. The `cMatrix`

object is a rather simple object to perform a few matrix operations. It has a few operator overloads and the ability to transpose a matrix. Download the project at the end of this post to see the details. Here is the Householder decomposition.

void cMatrix::householderDecomposition(cMatrix& Q, cMatrix& R) {

double mag, alpha;

cMatrix u(m, 1), v(m, 1);

cMatrix P(m, m), I(m, m);

Q = cMatrix(m, m);

R = *this;

for (int i = 0; i < n; i++) { u.zero(); v.zero(); mag = 0.0; for (int j = i; j < m; j++) { u.A[j] = R.A[j * n + i]; mag += u.A[j] * u.A[j]; } mag = sqrt(mag); alpha = u.A[i] < 0 ? mag : -mag; mag = 0.0; for (int j = i; j < m; j++) { v.A[j] = j == i ? u.A[j] + alpha : u.A[j]; mag += v.A[j] * v.A[j]; } mag = sqrt(mag); if (mag < 0.0000000001) continue; for (int j = i; j < m; j++) v.A[j] /= mag; P = I - (v * v.transpose()) * 2.0; R = P * R; Q = Q * P; } } [/sourcecode] Download this project: qr_householder.cc.bz2

]]>The Disjoint Set data structure allows us to track elements partitioned into disjoint subsets. Two sets are disjoint provided their intersection is the empty set, i.e. no element belongs to both sets. In addition to tracking the subsets the structure identifies a representative element (in the second pass of the Connected Component Labeling method we will replace all elements with the representative element of the subset to which they belong).

The data structure should implement three methods, `MakeSet()`

for generating new sets, `Find()`

for locating the representative element, and `Union()`

for joining two subsets into one equivalence set. These methods are available in the declaration below. Additionally, in anticipation of the Labeling method we've include two methods, `Reduce()`

for reducing the labels of the representative elements (`node.i`

) to the sequence 0,1,...,n-1 and returning the number of subsets, n, and `Reset()`

for setting the element count to zero.

#include <vector> struct node { node *parent; int i, rank; }; class cDisjointSet { private: std::vector<node *> nodes; int elements, sets; protected: public: cDisjointSet(); ~cDisjointSet(); node* MakeSet(int i); node* Find(node* a); void Union(node* a0, node* a1); int ElementCount(); int SetCount(); int Reduce(); void Reset(); };

In the definition of `cDisjointSet`

below the `MakeSet()`

method simply allocates a node if necessary, points the node to itself, sets the label, and sets its rank to zero (for implementing union by rank). The `Find()`

method seeks the representative element. Here we've implemented this method with path compression to connect child nodes directly to their representative element. This will reduce the seek time by flattening out the tree. The third method, `Union()`

, joins two sets into one. We've implemented this method with union by rank. By comparing rank we can improve the balance of the tree by joining the smaller tree to the larger. As mentioned the `Reduce()`

method seeks out representative elements and assigns each a unique label in 0,1,...,n-1.

#include "disjointset.h" cDisjointSet::cDisjointSet() : elements(0), sets(0) { } cDisjointSet::~cDisjointSet() { for (int i = 0; i < nodes.size(); i++) delete nodes[i]; nodes.clear(); } node* cDisjointSet::MakeSet(int i) { if (elements + 1 > nodes.size()) nodes.push_back(new node); nodes[elements]->parent = nodes[elements]; nodes[elements]->i = i; nodes[elements]->rank = 0; elements++; sets++; return nodes[elements-1]; } node* cDisjointSet::Find(node* a) { // with path compression if (a->parent == a) return a; else { a->parent = Find(a->parent); return a->parent; } } void cDisjointSet::Union(node* a0, node* a1) { // union by rank if (a0 == a1) return; node *a2 = Find(a0); node *a3 = Find(a1); if (a2 == a3) return; if (a2->rank < a3->rank) a2->parent = a3; else if (a3->rank < a2->rank) a3->parent = a2; else { a2->parent = a3; a3->rank++; } sets--; } int cDisjointSet::ElementCount() { return elements; } int cDisjointSet::SetCount() { return sets; } int cDisjointSet::Reduce() { int j = 0; for (int i = 0; i < elements; i++) if (nodes[i]->parent == nodes[i]) nodes[i]->i = j++; return j; } void cDisjointSet::Reset() { elements = sets = 0; }

Below we declare an object, `cTracker2`

, which differs slightly from `cTracker`

in our previous projects. Here we are not using cvBlobsLib, so we've removed those properties relevant to cvBlobsLib and added a method, `extractBlobs()`

, to take its place.

#ifndef TRACKER2_H #define TRACKER2_H #include <opencv/cv.h> #include "blob.h" #include "disjointset.h" class cTracker2 { private: double min_area, max_radius; node **labels; unsigned int width, height; // storage of the current blobs and the blobs from the previous frame vector<cBlob> blobs, blobs_previous; cDisjointSet ds; protected: public: cTracker2(double min_area, double max_radius); ~cTracker2(); void extractBlobs(cv::Mat &mat); void trackBlobs(cv::Mat &mat, bool history); void scaleBlobs(); vector<cBlob>& getBlobs(); }; #endif

Here we will only discuss the `extractBlobs()`

method; the rest is identical to our previous project. As it's name implies the Connected Component Labeling method will step through our threshold image labeling pixels. Those pixels that belong to the same connected component will be joined into one equivalence set. Here we've implemented the 4-connected model, i.e. the values of the pixels above, below, and to either side of each pixel are evaluated to determine if they belong to the same equivalence set. As we are stepping through the image left to right, top to bottom, we need only evaluate the pixel values above and to the left. These pixels will be joined with the pixels further on by the data structure provided they belong to the same connected component.

I've optimized this algorithm slightly in the code below. To prevent the need for checking bounds, we've separated the algorithm into three stages. The top left pixel has no need to check any neighbors, so we generate a new set for this pixel. We then process the remainder of the first row. These pixels only need to check the value of the pixel to the left. If the values are the same we assign the current pixel the label of its neighbor, otherwise we generate a new set. In the third stage we need to check the value of the pixels above and to the left. If the value of the pixel to the left of the current pixel is the same we assign the current pixel the same label and check the value of the pixel above. If the value of the pixel above is the same as that of to the left, we join them into the same equivalence set. If the value of the pixel to the left is not the same as the current pixel but the value of the pixel above is, we assign the current pixel the same label as the pixel above. Lastly, if neither condition applies, we generate a new set.

Once this first pass is complete we user our helper method, `Reduce()`

, to generate a sequence, 0,1,...,n-1, for the representative elements and return the number of connected components, n. We then push n temporary blobs onto the vector and start our second pass. In our second pass we simple search for the axis-aligned bounding box for each blob. Afterward, we apply our blob filter based on minimum area and evaluate the blob centers based on the axis-aligned bounding boxes.

#include "tracker2.h" cTracker2::cTracker2(double min_area, double max_radius) : min_area(min_area), max_radius(max_radius), labels(NULL) { } cTracker2::~cTracker2() { if (labels) delete [] labels; } void cTracker2::extractBlobs(cv::Mat &mat) { // mat.cols, mat.rows -- allocate vectors if (mat.cols != width || mat.rows != height) { width = mat.cols; height = mat.rows; if (labels) delete [] labels; labels = new node*[width*height]; } // reset our data structure for reuse ds.Reset(); int index; // generate equivalence sets -- connected component labeling (4-connected) labels[0] = ds.MakeSet(0); for (int j = 1; j < mat.cols; j++) labels[j] = mat.data[j] != mat.data[j-1] ? ds.MakeSet(0) : labels[j-1]; for (int j = mat.cols; j < mat.rows*mat.cols; j++) { if (mat.data[j] == mat.data[j-1]) { labels[j] = labels[j-1]; if (mat.data[j-1] == mat.data[j-mat.cols]) ds.Union(labels[j-1], labels[j-mat.cols]); } else if (mat.data[j] == mat.data[j-mat.cols]) labels[j] = labels[j-mat.cols]; else labels[j] = ds.MakeSet(0); } // the representative elements in our disjoint set data struct are associated with indices // we reduce those indices to 0,1,...,n and allocate our blobs cBlob temp; temp.event = BLOB_NULL; blobs.clear(); for (int i = 0; i < ds.Reduce(); i++) blobs.push_back(temp); // populate our blob vector for (int j = 0; j < mat.rows; j++) { for (int i = 0; i < mat.cols; i++) { index = ds.Find(labels[j*mat.cols+i])->i; if (blobs[index].event == BLOB_NULL) { blobs[index].min.x = blobs[index].max.x = i; blobs[index].min.y = blobs[index].max.y = j; blobs[index].event = BLOB_DOWN; blobs[index].height = 0; } else { if (blobs[index].min.x > i) blobs[index].min.x = i; else if (blobs[index].max.x < i) blobs[index].max.x = i; blobs[index].max.y = j; } } } // apply blob filter for (int i = 0; i < blobs.size(); i++) { if ((blobs[i].max.x-blobs[i].min.x)*(blobs[i].max.y-blobs[i].min.y) < min_area) { blobs.erase(blobs.begin()+i); i--; } } // find blob centers for (int i = 0; i < blobs.size(); i++) { blobs[i].location.x = blobs[i].origin.x = (blobs[i].max.x + blobs[i].min.x) / 2.0; blobs[i].location.y = blobs[i].origin.y = (blobs[i].max.y + blobs[i].min.y) / 2.0; } } void cTracker2::trackBlobs(cv::Mat &mat, bool history) { // clear the blobs from two frames ago blobs_previous.clear(); // before we populate the blobs vector with the current frame, we need to store the live blobs in blobs_previous for (int i = 0; i < blobs.size(); i++) if (blobs[i].event != BLOB_UP) blobs_previous.push_back(blobs[i]); extractBlobs(mat); // initialize previous blobs to untracked for (int i = 0; i < blobs_previous.size(); i++) blobs_previous[i].tracked = false; // main tracking loop -- O(n^2) -- simply looks for a blob in the previous frame within a specified radius for (int i = 0; i < blobs.size(); i++) { for (int j = 0; j < blobs_previous.size(); j++) { if (blobs_previous[j].tracked) continue; if (sqrt(pow(blobs[i].location.x - blobs_previous[j].location.x, 2.0) + pow(blobs[i].location.y - blobs_previous[j].location.y, 2.0)) < max_radius) { blobs_previous[j].tracked = true; blobs[i].event = BLOB_MOVE; blobs[i].origin.x = history ? blobs_previous[j].origin.x : blobs_previous[j].location.x; blobs[i].origin.y = history ? blobs_previous[j].origin.y : blobs_previous[j].location.y; } } } // add any blobs from the previous frame that weren't tracked as having been removed for (int i = 0; i < blobs_previous.size(); i++) { if (!blobs_previous[i].tracked) { blobs_previous[i].event = BLOB_UP; blobs.push_back(blobs_previous[i]); } } } vector<cBlob>& cTracker2::getBlobs() { return blobs; }

In the `main.cc`

file I've added a key event. Pressing `e`

toggles the blob extraction method from cvBlobsLib to the method discussed here.

I am currently working on an event system for handling blob and fiducial events. My next post will likely focus on an event queue which parses out events to registered widgets.

Download this project: tracker2.tar.bz2

]]>