QR decomposition using Householder transformations

It's been a while since my last post. A project I have in the works requires some matrix decompositions, so I thought this would be a good opportunity to get a post out about QR decompositions using Householder transformations. For the moment we will focus on the field of real numbers, though we can extend these concepts to the complex field if necessary.

 Theorem. A real matrix, , can be decomposed as , where  is an orthogonal matrix and  is an upper triangular matrix.

The proof of this theorem has been omitted but could be constructed using Householder transformations. We'll discuss the Householder transformation and see how it can be applied to perform the QR decomposition. We'll review the decomposition algorithm and, lastly, have a look at some C++ code.

A Householder transformation is a linear transformation given by the matrix, ,

\begin{align}
\mathbf{P} &= \mathbf{I} - 2\hat{v}\hat{v}^T\\
\end{align}

where  is the identity matrix and  is a unit vector.  is the reflection of  about the hyperplane passing through the origin with normal vector, .

\begin{align}
\mathbf{P}\vec{x} &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\vec{x}\\
&= \vec{x} - 2\hat{v}\hat{v}^T\vec{x}\\
\end{align}

If  is the projection of  onto , then  should reflect  about the hyperplane with normal, . See the diagram below.

Diagram representing the Householder transformation.

Some properties of  follow.

\begin{align}
\mathbf{P}^T &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)^T\\
&= \mathbf{I} - 2\left(\hat{v}\hat{v}^T\right)^T\\
&= \mathbf{I} - 2\hat{v}\hat{v}^T\\
&= \mathbf{P}
\end{align}

, thus,  is symmetric.

\begin{align}
\mathbf{P}^T\mathbf{P} &= \mathbf{P}\mathbf{P}\\
&= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\\
&= \mathbf{I} - 4\hat{v}\hat{v}^T + 4\hat{v}\hat{v}^T\hat{v}\hat{v}^T\\
&= \mathbf{I} - 4\hat{v}\hat{v}^T + 4\hat{v}\left(\hat{v}^T\hat{v}\right)\hat{v}^T\\
&= \mathbf{I} - 4\hat{v}\hat{v}^T + 4\hat{v}\hat{v}^T\\
&= \mathbf{I}
\end{align}

, thus,  is orthogonal.

, thus,  is an involution.

\begin{align}
\mathbf{P}\vec{v} &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\vec{v}\\
&= \vec{v} - 2\hat{v}\hat{v}^T\hat{v}\|\vec{v}\|\\
&= \vec{v} - 2\vec{v}\\
&= -\vec{v}\\
\end{align}

thus,  is an eigenvalue of  with multiplicity one.

For a vector, , orthogonal to , we have,

\begin{align}
\mathbf{P}\vec{u} &= \left(\mathbf{I} - 2\hat{v}\hat{v}^T\right)\vec{u}\\
&= \vec{u} - \vec{0}\\
&= \vec{u}\\
\end{align}

thus,  is an eigenvalue of  with multiplicity , since there are  vectors orthogonal to .

Now that we have touched on the Householder transformation, we'll see how the transformation can be applied to decompose a matrix into the product of an orthogonal matrix and an upper triangular matrix. The basic idea is to apply multiple transformations to successively zap the entries below the main diagonal column by column. We will then have something of the form,

\begin{align}
\mathbf{A} &= \left(\mathbf{Q}_1\mathbf{Q}_2\ldots\mathbf{Q}_n\right)\left(\mathbf{Q}_n\mathbf{Q}_{n-1}\ldots\mathbf{Q}_1\mathbf{A}\right)\\
\end{align}

Clearly, from the properties above, , so this is a valid statement. Additionally,  is the product of orthogonal matrices and is itself orthogonal. What is maybe not so clear at this point is that the product, , is an upper triangular matrix.

Consider  as,

\begin{align}
\mathbf{A} &= \begin{pmatrix}
\vec{a_1} & \vec{a_2} & \cdots & \vec{a_n}\\
\end{pmatrix}\\
\end{align}

The first step is to find the Householder transformation, , that rotates the vector, , to the vector, , which amounts to finding the appropriate unit vector, . Note that we could also rotate the vector, , to the vector, . In our implementation we will choose the sign to improve numerical stability. For the vector, , we will choose the sign to be .

If we choose  such that,

\begin{align}
\vec{v_1} &= \frac{\vec{a_1}-\|\vec{a_1}\|\hat{e_1}}{\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|}\\
\end{align}

where  is a standard basis vector, then we can show that  as follows. Working with the denominator,

\begin{align}
\mathbf{Q}_1\vec{a_1} &= \left[\mathbf{I} - 2\left(\frac{\vec{a_1}-\|\vec{a_1}\|\hat{e_1}}{\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|}\right)\left(\frac{\vec{a_1}-\|\vec{a_1}\|\hat{e_1}}{\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|}\right)^T\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{\left(\vec{a_1} - \|\vec{a_1}\|\hat{e_1}\right)^T\left(\vec{a_1} - \|\vec{a_1}\|\hat{e_1}\right)}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{\left(\vec{a_1}^T - \|\vec{a_1}\|\hat{e_1}^T\right)\left(\vec{a_1} - \|\vec{a_1}\|\hat{e_1}\right)}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{\vec{a_1}^T\vec{a_1} - 2\|\vec{a_1}\|\hat{e_1}^T\vec{a_1} + \|\vec{a_1}\|^2\hat{e_1}^T\hat{e_1}}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\|\vec{a_1}\|^2 - 2\|\vec{a_1}\|\hat{e_1}^T\vec{a_1}}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\left(\|\vec{a_1}\|^2-\|\vec{a_1}\|\hat{e_1}^T\vec{a_1}\right)}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\left(\vec{a_1}^T\vec{a_1} - \|\vec{a_1}\|\hat{e_1}^T\vec{a_1}\right)}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\left(\vec{a_1}^T - \|\vec{a_1}\|\hat{e_1}^T\right)\vec{a_1}}\right]\vec{a_1}\\
&= \left[\mathbf{I} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T}{2\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}\right]\vec{a_1}\\
\end{align}

Finishing up with the numerator,

\begin{align}
\mathbf{Q}_1\vec{a_1} &= \vec{a_1} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)^T\vec{a_1}}{2\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}\\
&= \vec{a_1} - 2\frac{\left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}{2\|\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\|\vec{v_1}^T\vec{a_1}}\\
&= \vec{a_1} - \left(\vec{a_1}-\|\vec{a_1}\|\hat{e_1}\right)\\
&= \|\vec{a_1}\|\hat{e_1}\\
\end{align}

Multiplying  by  yields a matrix of the following form.

\begin{align}
\mathbf{Q}_1\mathbf{A} &= \begin{pmatrix}\|\vec{a_1}\| & \star & \cdots & \star\\0 \\\vdots & & \mathbf{A_1}\\0\end{pmatrix}\\
\end{align}

Our job now is to find the matrix, , to deal with the sub-matrix . Note that  will have the following form.

\begin{align}
\mathbf{Q}_2 &= \begin{pmatrix}
1 & 0 & \cdots & 0\\
0\\
\vdots & & \mathbf{I} - 2\hat{v_2}\hat{v_2}^T \\
0\\
\end{pmatrix}\\
\end{align}

We proceed in this fashion until all entries below the main diagonal are zero. Thus,  will be an upper triangular matrix.

We can now write out an algorithm for performing the QR decomposition using Householder transformations. This algorithm is rather straightforward but naive in the sense that a more efficient algorithm could be written.

input:  A (m x n matrix)
output: Q (m x m matrix), R (m x n matrix)

u = m-dimensional vector
v = m-dimensional vector

I = identity matrix
P = I
Q = I
R = A

for i = 1 to n
u.zero()
v.zero()

for j = i to m
u[j] = R[j][i]

alpha = u[i] < 0 ? u.length() : -u.length()

for j = 1 to m
v[j] = j == i ? u[j] + alpha : u[j]

if (v.length() < epsilon) continue

v.normalize()

P = I - 2 * v * v.transpose()

R = P * R
Q = Q * P
end for


Below we have a C++ implementation of the decomposition algorithm above. The householderDecomposition method is a member of the cMatrix object. The cMatrix object is a rather simple object to perform a few matrix operations. It has a few operator overloads and the ability to transpose a matrix. Download the project at the end of this post to see the details. Here is the Householder decomposition.

void cMatrix::householderDecomposition(cMatrix& Q, cMatrix& R) {
double mag, alpha;
cMatrix u(m, 1), v(m, 1);
cMatrix P(m, m), I(m, m);

Q = cMatrix(m, m);
R = *this;

for (int i = 0; i < n; i++) {
u.zero(); v.zero();

mag = 0.0;
for (int j = i; j < m; j++) {
u.A[j] = R.A[j * n + i];
mag += u.A[j] * u.A[j];
}
mag = sqrt(mag);

alpha = u.A[i] < 0 ? mag : -mag;

mag = 0.0;
for (int j = i; j < m; j++) {
v.A[j] = j == i ? u.A[j] + alpha : u.A[j];
mag += v.A[j] * v.A[j];
}
mag = sqrt(mag);

if (mag < 0.0000000001) continue;

for (int j = i; j < m; j++) v.A[j] /= mag;

P = I - (v * v.transpose()) * 2.0;

R = P * R;
Q = Q * P;
}
}