CUDA – www.keithlantz.net

Path tracing: sphere and triangle texture mapping

Keith Lantz — Fri, 18 Oct 2013 18:16:03 +0000

I had been planning to refine this project and offer a more detailed writeup on adding sphere and triangle texture mapping to the path tracer project, but for the time being I thought I would offer the code for download. Below are a couple of screen captures from the most recent version of the project.

The code is available below.

15,676 samples.

15,005 samples.

Download this project: path_tracer_texture_mapping.tar.bz2

kd tree construction using the surface area heuristic, stack-based traversal, and the hyperplane separation theorem

Keith Lantz — Sun, 21 Apr 2013 20:21:18 +0000

In this post we will employ the hyperplane separation theorem and the surface area heuristic for kd tree construction to improve the performance of our path tracer. Previous posts have relied simply on detecting intersections between an axis aligned bounding box and the minimum bounding box of a triangle primitive. By utilizing the hyperplane separation theorem, we can cull additional triangles from a list of potential intersection candidates. From here, we will set out to construct a kd tree using the surface area heuristic. The construction method we will implement has complexity, \(O(n \cdot \log^2n)\). Finally, we will implement a stack-based kd tree traversal algorithm.

Below is a test render of the Sponza model. Some of the features of the original model have been stripped for this render, leaving approximately 150,000 triangles.

Test render of a stripped down version of the Sponza model (approximately 150,000 triangles). 4,258 samples.

Below is a render of the Dragon model available from the Stanford 3D Scanning Repository. The rendered model contained 100,000 triangles.

Render of the Dragon model available from the Stanford 3D Scanning Repository. 100,000 triangles. 1,005 samples.

Hyperplane Separation Theorem
The hyperplane separation theorem is a theorem about disjoint, convex sets. For our purposes we will be applying the theorem to a combination of an axis-aligned bounding box and a triangle in three dimensions. Since we will consider both the triangle and the axis-aligned bounding box to be compact (and convex), then, provided these two sets are disjoint, we can locate two parallel hyperplanes between them separated by a gap. We need only one separating hyperplane to conclude that the triangle and axis-aligned bounding box do not intersect. Below is an image depicting this theorem in two dimensions. The green line is a separating axis and the black line is a separating line.

Separating Axis vs Separating Line in 2D, including Projected Intervals. Courtesy of Oleg Alexandrov, Jon Watte.

In three dimensions, we will have a separating axis and a separating plane. We have a few different contact situations between objects to concern ourselves with: face to face contact, face to edge contact, and edge to edge contact. Thus, we have a list of potential separating axes including the normals to the faces and the cross products of the combinations of an edge from one object with an edge from the other. The face to edge contacts are handled by the face normals. The axis-aligned bounding box has six faces, but three sets of two parallel faces, so we have three potential separating axes. The triangle normal is a fourth. The axis-aligned bounding box has 12 edges, but three sets of four parallel edges, and the triangle has three edges, yielding nine cross products for a total of 13 potential separating axes. We need only complete all thirteen tests to conclude that the axis-aligned bounding box and the triangle are not disjoint. However, if any particular tests yields a separating plane, we need not complete the remaining tests to conclude that the objects are disjoint.

We will first translate the triangle and the axis-aligned bounding box such that the center of the box is located at the origin. Below is some code to detect a separation. We need not be concerned with the direction of the projections, but only their magnitudes. Because the box is located at the origin, we define a radius, r, based on the half dimensions of the box.

bool hyperplaneSeparation(__vector n,
		          __vector p0, __vector p1, __vector p2,
		          double halfWidth, double halfHeight, double halfDepth) {
	double _p0 = n * p0,
	       _p1 = n * p1,
	       _p2 = n * p2;
	double min = MIN(_p0, MIN(_p1, _p2)),
	       max = MAX(_p0, MAX(_p1, _p2));
	double r = halfWidth  * fabs(n.x) +
	           halfHeight * fabs(n.y) +
	           halfDepth  * fabs(n.z);
	return -r > max || r < min;
}

The buildTree() function in the project download uses this method.

kd tree construction using the surface area heuristic
kd tree construction using the surface area heuristic is a greedy algorithm. During the build process, we will attempt to compare the cost of splitting a node with the cost of not splitting. If the local cost of splitting is less than not splitting, we split the node. Otherwise, we convert the current node to a leaf. The function we will use to estimate the cost is given below [2],
\begin{align}
C_V(p) &= K_T + K_I\left( \frac{SA(V_L)}{SA(V)} T_L + \frac{SA(V_R)}{SA(V)} T_R \right) \\
C_{NS} &= K_IT \\
\end{align}
where \(K_T\) is the cost of a traversal, \(K_I\) is the triangle intersection cost, \(SA(V_L), SA(V_R), SA(V)\) are the surface areas of the left node, right node, and current node, respectively, \(T_L, T_R, T\) are the number of triangles in the left node, right node, and current node, respectively, \(C_V(p)\) is the cost of splitting the current node, and \(C_{NS}\) is the cost of not splitting the current node.

We have \(6T\) potential split positions comprising each of the three axes with the minimum and maximum value for each axis from each triangle. The algorithm presented here is similar to the \(O(n \cdot \log^2n)\) algorithm described in [2]. For each axis we push the minimum triangle coordinate to a list with an event, PRIMITIVE_START, and the maximum coordinate with an event, PRIMITIVE_END. The lists are then sorted based on the coordinate value, \(O(n \cdot \log n)\). For each split position, we will consider the triangle to reside in both nodes, so for the first split position we have \(T_L=1\) and \(T_R=T\). As we progress to the next split position, if that event is a PRIMITIVE_START, we increment \(T_L\). If the event is a PRIMITIVE_END, we decrement \(T_R\) on the following pass, since that event corresponds to a triangle that we are including in both nodes (a vertex lies on the split plane). We now have \(T_L\) and \(T_R\) for each split position, we can evaluate the surface areas based on the split position, and throw some estimates in for \(K_T\) and \(K_I\). On each pass we evaluate \(C_V(p)\) and retain the best cost and split position, \(p\). Once we have processed all potential split positions, we compare the best cost with \(C_{NS}\) and split the node if \(C_V(p) \lt C_{NS}\).

The project download generates the kd tree recursively host-side, transfers it to a structure of arrays, and passes it to the device. The buildKdTree() function in the download allows you to pass a type parameter. This parameter can be KD_EVEN, splitting each node in the center resulting in a binary space partition, KD_MEDIAN, splitting each node at the object median, or KD_SAH, splitting the node using surface area heuristics. Below are three visualizations of the tree structure for each type.

Visualization of the kd tree built using an even split.

Visualization of a kd tree built using the object median for the split plane.

Visualization of a kd tree built using the surface area heuristic.

stack-based traversal
To implement a stack-based traversal of the kd tree, we first created a stack object below. The __stack_element contains an id to reference a node, and the \(t_{min}\) and \(t_{max}\) values for a ray, \(\vec{r} = \vec{o} + \hat{d}t\), passing through the node.

struct __stack_element {
	int id;
	double tmin, tmax;
};

class __stack {
public:
	__stack_element stack[32];
	int count;
	__device__ __stack();
	__device__ void push(int id, double tmin, double tmax);
	__device__ __stack_element pop();
	__device__ bool empty();
};

__device__ __stack::__stack() : count(0) {}

__device__ void __stack::push(int id, double tmin, double tmax) {
	this->stack[count].id = id;
	this->stack[count].tmin = tmin;
	this->stack[count].tmax = tmax;
	count++;
}

__device__ __stack_element __stack::pop() {
	count--;
	__stack_element se;
	se.id = this->stack[count].id;
	se.tmin = this->stack[count].tmin;
	se.tmax = this->stack[count].tmax;
	return se;
}

__device__ bool __stack::empty() {
	return this->count == 0;
}

With the stack object, it was fairly straightforward to implement the algorithm below. See [3] and [4] for details. The algorithm descends through the tree, pushing the further nodes onto the stack. With the nearest nodes evaluated first, we can break early upon finding an intersection within the bounds.

intersection = none;
if (ray intersects root node) {
    stack.push(root node, tmin, tmax);
    while (!stack.empty() && !intersection) {
        (node, tmin, tmax) = stack.pop();
        while (!node.isLeaf()) {
            tsplit = (node.split - ray.origin[node.axis]) / ray.direction[node.axis];
            if (node.split - ray.origin[node.axis] >= 0) {
                first = node.left;
                second = node.right;
            } else {
                first = node.right;
                second = node.left;
            }
            if (tsplit >= tmax || tsplit < 0)
                node = first;
            else if (tsplit <= tmin)
                node = second;
            else {
                stack.push(second, tsplit, tmax);
                node = first;
                tmax = tsplit;
            }
        }
        foreach (triangle in node)
            if (ray intersects triangle)
                intersection = nearest intersection;
        if (nearest intersection > tmax)
                intersection = none;
    }
}

Download the project and have a look at the code. Let me know if you have any thoughts.

Download this project: path_tracer.tar.bz2

References:
1. Akenine-Möller, Tomas. Fast 3D triangle-box overlap testing. In ACM SIGGRAPH 2005 Courses, ACM. Los Angeles, California. 2005.
2. Wald, Ingo, and Havran, Vlastimil. On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N). IN PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON INTERACTIVE RAY TRACING. 2006.
3. Wald, Ingo. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University.
4. Horn, Daniel Reiter, Sugerman, Jeremy, Houston, Mike, and Hanrahan, Pat. 2007. Interactive k-d tree GPU raytracing. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, ACM. Seattle, Washington.
5. Havran, Vlastimil. 2000. Heuristic Ray Shooting Algorithms. Ph.D. Thesis, Czech Technical University in Prague.

Path tracer: thin lens, texture mapping, Fresnel equations, and smooth shading

Keith Lantz — Sat, 13 Apr 2013 23:00:23 +0000

A few new features have been added to our path tracer. The depth of field extension has been reworked slightly using the thin lens equation allowing us to specify a focal length and aperture. Fresnel equations have been added to more accurately model the behavior of light at the interface between media of different refractive indices. Textures can be applied to the plane primitive, and normals can be interpolated across the triangle primitive allowing for smooth shading. Below are three renders depicting these features.

1,800 samples. A render illustrating Fresnel reflection and texture mapping.

1,610 samples. A render illustrating smooth shading.

An example render illustrating depth of field.

Thin lens
We first reworked the camera model using the thin lens equation. Below, \(f\) is the focal length, \(d\) is the distance to the focal plane, and \(i\) is the distance to the image plane.

\begin{align}
\frac{1}{f} &= \frac{1}{d} + \frac{1}{i} \\
i &= \frac{1}{\frac{1}{f} - \frac{1}{d}} = \frac{fd}{d-f} \\
\end{align}

For a 50mm lens focused at 10m the image plane is located at approximately 5.025mm. For a lens set to f/8 this yields a radius, \(r\), of the entrance pupil of,

\begin{align}
r &= \frac{1}{2} \cdot \frac{f}{8} \\
&= \frac{1}{2} \cdot \frac{50mm}{8} &= 3.125mm \\
\end{align}

In the code we specify the focal length, aperture, and distance to the focal plane. From this we evaluate the distance to the image plane and the aperture size. The kernel is set to simulate a a 36mm wide sensor by a height evaluated appropriately for the given ratio. We fire rays from a location on the sensor through the origin to a point, \(\vec{p}\), on the focal plane. We then jitter the origin within the disc defined by the aperture radius. If the new offset is \(\vec{o}\), the ray direction is \(\vec{r}=\vec{p}-\vec{o}\), and we sample the ray \(\vec{o} + t\vec{r}\).

Fresnel reflection
Next, we added support for Fresnel reflection. This was a straightforward modification to our refractive material. We simply find the reflection coefficient, \(R\), for unpolarized light given below,

\begin{align}
R &= \frac{R_s+R_p}{2} \\
R_s &= \left( \frac{-n_1 \hat{r} \cdot \hat {n} - n_2 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]}}{-n_1 \hat{r} \cdot \hat {n} + n_2 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]}} \right)^2\\
R_p &= \left( \frac{n_1 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]} - -n_2 \hat{r} \cdot \hat {n}}{n_1 \sqrt{1 - \frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]} + -n_2 \hat{r} \cdot \hat {n}} \right)^2\\
\end{align}

Note that all vectors above are unit vectors. We next generate a uniform random variable on the interval \([0,1]\) and reflect the ray if this number is less than \(R\). We refract and transmit otherwise.

Smooth shading
In this post we discussed triangle intersections, so we have \(s\) and \(t\) for our point of intersection, \(\vec{p}\),

\begin{align}
\vec{p} &= \vec{p}_0 + s(\vec{p}_1 - \vec{p}_0) + t(\vec{p}_2 - \vec{p}_0) \\
\end{align}

Provided we have a normal for each vertex, we can exploit the \(s\) and \(t\) evaluations and use them for interpolating the normals,

\begin{align}
\vec{n} &= \vec{n}_0 + s(\vec{n}_1 - \vec{n}_0) + t(\vec{n}_2 - \vec{n}_0) \\
\hat{n} &= \frac{\vec{n}}{\left|\left|\vec{n}\right|\right|}
\end{align}

Texture mapping the plane primitive
The last addition this time around was to add texture mapping support for the plane primitive. The general idea was to define two linearly-independent vectors that span the plane. With those two vectors and a point on the plane, we can find the \(s\) and \(t\) coordinates for our point of intersection as we do for the triangle primitive. Since the texture is repeating we find the coordinates, \(s'\) and \(t'\),

\begin{align}
s' &= s - \lfloor s \rfloor \\
t' &= t - \lfloor t \rfloor \\
\end{align}

We now have appropriate texture coordinates, \(s'\) and \(t'\), that both belong to the interval \([0,1]\). These are used as offsets into our texture.

Download this project: pathtracer_dof_triangles_fresnel_texture_smooth.tar.bz2

Path tracer with triangle primitives and binary space partitioning

Keith Lantz — Sun, 24 Mar 2013 23:15:31 +0000

UPDATE: The post below was a purely naive attempt at implementing a rudimentary bounding volume hierarchy. A much more efficient implementation using a kd tree is available in this post.

We will continue with the project we left off with in this post. We will attempt to add triangles to our list of primitives. Once we are able to render triangles, this opens the door to rendering full scale models. However, because models will contain upwards of thousands of triangles, we need to be able to organize those primitives effectively for intersection tests. For this we have implemented a rudimentary binary space partitioning. We will discuss towards the end what could be done to improve efficiency. Below are two renders.

Subdivided Blender monkey. 5,808 triangles. 1,000 samples.

Subdivided Blender monkey. 5,808 triangles. 390 samples.

In the post, A calibration method based on barycentric coordinates for multi-touch systems, we discussed barycentric coordinates. That concept will be used here for our triangle intersection tests. Our first job is to locate the point, \(\vec{p}\), where the ray intersects the plane in which the triangle lies (this was discussed a bit here). If a triangle is defined by the vertices, \(\vec{p}_0\), \(\vec{p}_1\), and \(\vec{p}_2\), the triangle normal can be given as \(\vec{n} = (\vec{p}_1-\vec{p}_0)\times(\vec{p}_2-\vec{p}_0)\). Once we have found the point, \(\vec{p}\), we evaluate the barycentric coordinates, \(s\) and \(t\), of the point relative to the triangle. These equations are given below, where \(\vec{v}_0 = \vec{p}_1-\vec{p}_0\) and \(\vec{v}_1 = \vec{p}_2 - \vec{p}_0\).

\begin{align}
s &= \frac{(\vec p \cdot \vec v_0)(\vec v_1 \cdot \vec v_1)-(\vec p \cdot \vec v_1)(\vec v_0 \cdot \vec v_1)}{(\vec v_0 \cdot \vec v_0)(\vec v_1 \cdot \vec v_1)-(\vec v_0 \cdot \vec v_1)^2}\\
t &= \frac{(\vec p \cdot \vec v_1)(\vec v_0 \cdot \vec v_0)-(\vec p \cdot \vec v_0)(\vec v_0 \cdot \vec v_1)}{(\vec v_0 \cdot \vec v_0)(\vec v_1 \cdot \vec v_1)-(\vec v_0 \cdot \vec v_1)^2}
\end{align}

Provided \(s\geq0\), \(t\geq0\), and \(s+t\leq1\), we can conclude that the point, \(\vec{p}\), lies inside the triangle. We can then reflect or transmit the ray appropriately depending on the material type.

This addendum to the path tracer project was relatively straight forward, but it does not scale well. For each ray we must find the nearest intersection, and for \(n\) primitives this amounts to \(n\) intersection tests on each ray bounce. We cannot afford to check each primitive on models containing thousands of primitives, so we have added a basic binary space partitioning.

The partitioning tree is generated host-side and transferred to the device. For this we have elected to represent our tree structure as a structure of arrays. Below is the structure as it stands. depth represents the depth of a specific node, minx, miny, ... maxz represent the bounds of the node, child0 and child1 represent the array index of the two child nodes, parent holds the index of the parent node, id is the index of the node, and leaf_id is a separate indexing that applies only to the leaf nodes. The leaf_id gives us an offset into the objects array which, itself, applies only to the leaf nodes. n_objects applies to all nodes and represents the number of objects that pass through a node. Lastly, max_depth holds the depth of our tree, size is the number of nodes, and leaf_size is the number of leaf nodes.

struct _bounding_box {
	unsigned short *depth,
		       *depth_device;
	double *minx, *miny, *minz, *maxx, *maxy, *maxz,
	       *minx_device, *miny_device, *minz_device, *maxx_device, *maxy_device, *maxz_device;
	short *child0, *child1,
	      *child0_device, *child1_device;
	short *parent,
	      *parent_device;

	short *id, *id_device;
	short *leaf_id, *leaf_id_device;

	unsigned short *n_objects,
		       *n_objects_device;
	unsigned short *objects,
		       *objects_device;
	unsigned short max_depth;
	unsigned short size, leaf_size;
};

If we have a tree with depth 3, then \(\text{size} = 2^{(3+1)}-1 = 15\) and \(\text{child_size} = 2^{3} = 8\). Thus, we would have \(15\) nodes in total and \(8\) leaf nodes.

The idea was to first evaluate (after the camera transformation) the minimum and maximum axes values of the axis-aligned box that bounds every primitive in our scene. These values are passed to our tree-building function, and the tree is generated by splitting along the major axis. If the dimensions of our root node are \((1,2,3)\), we would first split along the \(z\)-axis resulting in two children of size \((1,2,1.5)\). The second splits would occur along the \(y\)-axis resulting in 4 nodes of size \((1,1,1.5)\).

Once we reach a leaf node, we cycle through all of the primitives in our scene seeking those primitives that pass through the leaf node. Once the tree is built and all leaf nodes have been processed, we propagate back the number of objects in each child node to its parent. In the code we have also merged the objects from child nodes to the current node if the number of objects is below a certain threshold. There would be no sense in testing 16 child nodes if they all contain the same primitive.

When testing child nodes for the containment of primitives, we have cheated a bit. For one we have not added any plane primitives to the partitioning. We simply add these primitives to the list of objects we test for intersections. For the sphere primitive we have evaluated the radius of the bounding sphere of the given tree node and compared it with the radius of the primitive. If the distance between the sphere center and the box center is less than the sum of the radii, we include the primitive as passing through the tree node. Consequently, this will include spheres that should not necessarily belong to the node, but it will include all the spheres that should. Lastly, when testing for the containment of triangle primitives in a given tree node, we evaluate the axis-aligned bounding box of the primitive and test for overlap between the two bounding boxes. Again, this will potentially include many more primitives than it should but will capture all that is necessary.

Our ray sampling procedure has been updated to query for intersections with the bounding box. If we find the ray hits the root node, we then query the two child nodes. If the ray hits a child node, we check the children of that child node. We continue like this until we reach a leaf node. Upon reaching a leaf node, we add the objects contained in that leaf node to the list of objects to test against for intersections. Below is the function for testing whether a ray intersects an axis-aligned bounding box. There are a few cases. If the node does not contain any primitives, there is no point in testing any furthur (no children will contain any primitives either). Additionally, the ray could originate inside the bounding box, and, lastly, we check for intersection with the left, right, bottom, top, rear, and front box faces.

__device__ bool rayIntersects_device(_bounding_box& b, unsigned short index, __ray r) {
	// contains objects
	if (b.n_objects_device[index] < 1) return false;
	
	// containment of ray origin
	if (r.origin.x >= b.minx_device[index] && r.origin.x <= b.maxx_device[index] &&
	    r.origin.y >= b.miny_device[index] && r.origin.y <= b.maxy_device[index] &&
	    r.origin.z >= b.minz_device[index] && r.origin.z <= b.maxz_device[index])
		return true;
	
	// intsection tests
	if (r.origin.x < b.minx_device[index] && r.direction.x > 0) { // check left face intersection
		double t = (-b.minx_device[index] + r.origin.x) / -r.direction.x;
		//double x = r.origin.x + t * r.direction.x;
		double y = r.origin.y + t * r.direction.y;
		double z = r.origin.z + t * r.direction.z;
		if (y >= b.miny_device[index] && y <= b.maxy_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true;
	}
	if (r.origin.x > b.maxx_device[index] && r.direction.x < 0) { // check right face intersection
		double t = (b.maxx_device[index] - r.origin.x) / r.direction.x;
		//double x = r.origin.x + t * r.direction.x;
		double y = r.origin.y + t * r.direction.y;
		double z = r.origin.z + t * r.direction.z;
		if (y >= b.miny_device[index] && y <= b.maxy_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true;
	}
	if (r.origin.y < b.miny_device[index] && r.direction.y > 0) { // check bottom face intersection
		double t = (-b.miny_device[index] + r.origin.y) / -r.direction.y;
		double x = r.origin.x + t * r.direction.x;
		//double y = r.origin.y + t * r.direction.y;
		double z = r.origin.z + t * r.direction.z;
		if (x >= b.minx_device[index] && x <= b.maxx_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true;
	}
	if (r.origin.y > b.maxy_device[index] && r.direction.y < 0) { // check top face intersection
		double t = (b.maxy_device[index] - r.origin.y) / r.direction.y;
		double x = r.origin.x + t * r.direction.x;
		//double y = r.origin.y + t * r.direction.y;
		double z = r.origin.z + t * r.direction.z;
		if (x >= b.minx_device[index] && x <= b.maxx_device[index] && z >= b.minz_device[index] && z <= b.maxz_device[index]) return true;
	}
	if (r.origin.z < b.minz_device[index] && r.direction.z > 0) { // check rear face intersection
		double t = (-b.minz_device[index] + r.origin.z) / -r.direction.z;
		double x = r.origin.x + t * r.direction.x;
		double y = r.origin.y + t * r.direction.y;
		//double z = r.origin.z + t * r.direction.z;
		if (x >= b.minx_device[index] && x <= b.maxx_device[index] && y >= b.miny_device[index] && y <= b.maxy_device[index]) return true;
	}
	if (r.origin.z > b.maxz_device[index] && r.direction.z < 0) { // check front face intersection
		double t = (b.maxz_device[index] - r.origin.z) / r.direction.z;
		double x = r.origin.x + t * r.direction.x;
		double y = r.origin.y + t * r.direction.y;
		//double z = r.origin.z + t * r.direction.z;
		if (x >= b.minx_device[index] && x <= b.maxx_device[index] && y >= b.miny_device[index] && y <= b.maxy_device[index]) return true;
	}

	// no intersection
	return false;
}

Below is the function that adds primitives to the hit list. These are primitives we must check directly for intersections. It was an attempt to avoid recursion and is fairly crude. It starts by testing the root node and continues to add indices on a bounding box hit. When a leaf node is reached, we add only those primitives that have not already been added.

__device__ short intersects_device(_bounding_box& b, int i, __ray r, short hit_list[]) {
	int index = 0,
	    count = 1,
	    indices[30000];
	indices[index] = 0;

	short hit_count = 0;
	bool found = false;

	while (index < count && index < 30000) {
		i = indices[index++];
		if (rayIntersects_device(b, i, r)) {
			if (b.depth_device[i] == b.max_depth) {
				for (int j = 0; j < b.n_objects_device[i]; j++) {

					short hit = b.objects_device[ b.leaf_id_device[i] * 10000 + j ];

					found = false;
					for (int l = 0; l < hit_count; l++) {
						if (hit_list[l] == hit) {
							found = true;
							break;
						}
				    }

				    if (!found) hit_list[hit_count++] = hit;
				}
			} else {
				indices[count++] = b.child0_device[i];
				indices[count++] = b.child1_device[i];
			}
		}
	};
	
	return hit_count;
}

The sampleRay function has been updated to use the intersects_device method. It now loops over only those primitives that should be tested directly. Since we are handling planes directly, the project expects those planes to be added to the objects list first. The sampleRay has a second loop for handling planes. Once a primitive other than a plane is found, it breaks from the loop.

Occasionally during testing, the kernel would timeout. The number of rays each kernel call is forced to handle has been reduced to help prevent this from occurring. A kernel call now handles a 2 by 2 grid of blocks sized 16 by 16. Thus, at the moment the kernel only handles 1024 pixels on each pass. We send an offset in both the \(x\) and \(y\)-directions to update the entire image over successive loops.

Blender was used to export 3D models in OBJ format. The project expects triangles and normals to be present in the OBJ file. When exporting do not forget to check "Include Normals" and "Triangulate Faces".

This project is fairly crude. Below is a list of some ideas that could be implemented to improve the efficiency of the project.

kd-tree
improved intersection testing

ray-triangle intersections
containment testing for spheres in nodes
containment testing for triangles in nodes

shared memory
generating tree structure on device

Have a look at the project, and let me know if you have any questions or suggestions.

Download this project: pathtracer_dof_triangles.tar.bz2

Path tracer depth of field

Keith Lantz — Fri, 15 Mar 2013 00:23:28 +0000

This is a small extension to the previous post. We will add a depth of field simulation to our path tracer project. I ran across this algorithm at this site. Below is a render of our path tracer with the depth of field extension.

Scene rendered with 214 samples.

Essentially, we will define the distance to the focal plane and a blur radius. For each primary ray we find its intersection with the focal plane, \(\vec{p}\), and jitter the ray origin by an amount, \(\vec{d}\). We then define the new ray direction as \(\vec{r}=\vec{p}-\vec{d}\). Consequently, objects on the focal plane will appear in focus. Below is the addendum to the kernel() function.

		__vector dir = __vector(x - width  / 2, -y + height / 2, 0 + width) + offset;
		__ray ray = { __vector(0, 0, 0), dir.unit() };

		u1 = rand_device[i*width*height*3+index+1];
		u2 = rand_device[i*width*height*3+index+2];
		r1 = 2 * M_PI * u1;
		r2 = u2;
		offset = __vector(cos(r1)*r2, sin(r1)*r2, 0.0) * blur_radius;

		__vector p = ray.origin + dir * (focal_distance / width);

		ray.origin = ray.origin + offset;
		ray.direction = (p - ray.origin).unit();

Again, don't forget to update the Makefile to reference the proper locations for the libcudart.so and libcurand.so libraries.

Download the updated project: pathtracer_dof.tar.bz2

A basic path tracer with CUDA

Keith Lantz — Thu, 14 Mar 2013 02:34:19 +0000

The path tracer we will create in this project will run on CUDA-enabled GPUs. You will need to install the CUDA Toolkit available from NVIDIA. The device code for this project uses classes and must be compiled with compute capability 2.0. If you are unsure what compute capability your card has, check out this list.

Below are two screen captures of this project in action.

Scene rendered with 60 samples.

Scene rendered with 408 samples.

This path tracer is basic and fairly crude and inefficient. I'll provide a brief overview of the code before we delve into some of the mathematics. The host code defines an abstract base class, cObject, from which the cPlane and cSphere objects are derived. The base class includes the material type, color, emission color and type (plane or sphere) properties. The applyCamera() virtual function is defined in the super classes and transforms the respective object into camera space.

The objects in camera space are passed to the device where the environment is rendered. The runPathTracer() function in pathtracer.cu generates some random numbers, executes the kernel, and retrieves the current frame. This frame is rendered to a texture during program execution and saved to a ppm file upon program termination.

The kernel function runs through our buffer, and for each buffer location four rays are shot out in each of the four quadrants surrounding the buffer location using cosine-weighted sampling. These four samples are averaged and added to the accumulation. The device function, sampleRay(), is called on each ray. A maximum loop size is defined (e.g. 5 bounces), and sampling begins for the current ray.

The ray sampler loops over the maximum number of bounces. Within this loop, we loop over our objects seeking an intersection using the equations outlined below for spheres and planes. If an intersection is found (the nearest intersection), we set the values in our emission and color arrays and bounce the ray according to the material type (diffuse, specular, or refractive). Lastly, we apply the emission and color arrays to our final sample. If our final sample is \(\vec{s}_1\) and the emission and color values are \(\vec{e}_n\) and \(\vec{c}_n\), respectively, for \(n \in {1,2,...,m}\), where \(m\) is the bounce limit, the result would be,

\begin{align}
\vec{s}_{m} &= \vec{e}_m\\
\vec{s}_{n} &= \vec{e}_{n} + \vec{c}_{n} \circ \vec{s}_{n+1}\\
\end{align}

Below we will discuss some of the mathematics involved in the process before we mention interaction and conclude with a few notes.

Sphere intersection
Our path tracer will include support for spheres and planes. Below we have the equation for a sphere and a ray followed by the evaluation of the point of intersection, \(\vec{p}\). We have a point of intersection provided the determinant of the quadratic equation is positive. Lastly, we evaluate the surface normal by subtracting the sphere center from the point of intersection. Note that when we evaluate the roots of the quadratic, we will select the lesser of the two roots (the nearest point of intersection).
\begin{align}
(\vec{p} - \vec{c}) \cdot (\vec{p} - \vec{c}) &= r^2\\
\vec{r}(t) &= \vec{o} + \vec{r}t\\
(\vec{o} + \vec{r}t - \vec{c}) \cdot (\vec{o} + \vec{r}t - \vec{c}) &= r^2\\
(\vec{r}\cdot\vec{r})t^2 + 2 \cdot \vec{r} (\vec{o} - \vec{c}) t + (\vec{o} - \vec{c}) \cdot (\vec{o} - \vec{c}) - r^2 &= 0\\
\vec{n} &= \vec{p} - \vec{c}\\
\end{align}

Plane intersection
Below we have the equation for a plane followed by an evaluation of the point of intersection. Note that if the ray is parallel to the plane we have either no intersection or an unlimited number of intersections (the line is in the plane). Here we do not need to evaluate the normal, it is an inherent property of the plane.
\begin{align}
(\vec{p} - \vec{p}_0) \cdot \hat{n} &= 0\\
\vec{r}(t) &= \vec{o} + \vec{r}t\\
(\vec{o} + \vec{r}t - \vec{p}_0) \cdot \vec{n} &= 0\\
\end{align}

Specular reflection
The simplest of the three lighting models we will implement in this project, specular reflection gives objects a mirror-like quality. Incoming rays are reflected off the surface of an object in a direction uniquely defined by the incoming ray, \(\vec{r}\), and the unit vector normal to the surface at the point of intersection, \(\hat{n}\).
\begin{align}
\vec{t} &= 2(\hat{n}\cdot\vec{r})\hat{n} - \vec{r}\\
\end{align}

Diffuse reflection
To implement diffuse reflections we will use cosine-weighted sampling. More information on cosine-weighted sampling can be found here. Below \(u_1\) and \(u_2\) are uniform random variables. Ultimately, we will reorient the resultant vector based on the surface normal (we are sampling from the unit hemisphere defined by the surface normal at the point of intersection).
\begin{align}
u_1 &\sim U(0,1)\\
u_2 &\sim U(0,1)\\
r &= \sqrt{1-u_1}\\
\theta &= 2\pi u_2\\
\vec{v} &=
\begin{pmatrix}
r \cos(\theta)\\
r \sin(\theta)\\
\sqrt{u_1}\\
\end{pmatrix}
\end{align}

Refaction
Refraction gives the appearance of light traveling through a barrier, such as from air to glass. Below we have the equation for the transmission vector, \(\vec{t}\), based on Snell's equations. \(n_1\) and \(n_2\) are the indices of refraction of the two media. Obviously, this equation is only valid if the quantity under the radical is nonnegative. If this quantity is negative, we use the reflection equation above. Such a situation is known as total internal reflection. In our code we will intialize \(n_1\) and \(n_2\) by evaluating the inner product of the ray with the surface normal. If this product is less than zero, we will be entering the medium. We also flip the normal when exiting the medium. It should be relatively straight forward to add the Fresnel equations. Kevin Beason did so here.
\begin{align}
\vec{t} &= \frac{n_1}{n_2}\hat{r} - \left( \frac{n_1}{n_2} \hat{n}\cdot\hat{r} + \sqrt{1-\frac{n_1^2}{n_2^2} \left[1 - (\hat{n} \cdot \hat{r})^2 \right]} \right) \cdot \hat{n}\\
\end{align}

A spice of interaction
We have attempted to add some interaction to this project by including the keyboard handler available here. The premise behind this procedure is to reset the accumulated path values when the camera position or orientation changes. The path tracer begins to progressively refine the scene when the view remains static. Improvements to the project's efficiency would yield a better interactive experience.

Some notes
The larger the surface area of your light sources, the faster your scene will appear to converge (less noise), because the rays will hit a light source with greater probability. The project currently has a limit of 10 bounces. If you wish to exceed this limit, you must update the sampleRay() function in pathtracer.cu. Additionally, you will need to update the Makefile to reference the proper location for the libcudart.so and libcurand.so libraries.

If you have any suggestions for improving this path tracer or questions about it, let me know.

Download this project: pathtracer.tar.bz2

Additional information: