# 4 Inner Products Spaces

\(\newcommand{\vlist}[2]{#1_1,#1_2,\ldots,#1_#2}\) \(\newcommand{\vectortwo}[2]{\begin{bmatrix} #1 \\ #2\end{bmatrix}}\) \(\newcommand{\vectorthree}[3]{\begin{bmatrix} #1 \\ #2 \\ #3\end{bmatrix}}\) \(\newcommand{\vectorfour}[4]{\begin{bmatrix} #1 \\ #2 \\ #3 \\ #4\end{bmatrix}}\) \(\newcommand{\vectorfive}[5]{\begin{bmatrix} #1 \\ #2 \\ #3 \\ #4 \\ #5 \end{bmatrix}}\) \(\newcommand{\lincomb}[3]{#1_1 \vec{#2}_1+#1_2 \vec{#2}_2+\cdots + #1_m \vec{#2}_#3}\) \(\newcommand{\norm}[1]{\left|\left |#1\right|\right |}\) \(\newcommand{\ip}[1]{\left \langle #1\right \rangle}\) \(\newcommand{\plim}[2]{\lim_{\footnotesize\begin{array}{c} \\[-10pt] #1 \\[0pt] #2 \end{array}}}\)

Write a content brief for the book ``Orthogonality (A Lively Introduction with Proofs).” The book covers orthonormal bases, orthogonal projections, Gram-Schmidt process, QR factorization, orthogonal transformations, and orthogonal matrices. It also includes inner-products.

Orthogonality is a vital topic in linear algebra and mathematics as a whole. It is often studied in the context of inner product spaces, where it plays a key role in many important results. In this book, we will give a lively introduction to orthogonality, with an emphasis on proofs and intuition. We will also discuss inner-products and their relationship to orthogonality. This book is aimed at students who are studying linear algebra or related topics. It will be a valuable resource for those who want to deepen their understanding of orthogonality and its many applications.

Orthonormal bases and orthogonal projections are two of the most important concepts in linear algebra. An orthonormal basis is a set of vectors that are all perpendicular to each other, and an orthogonal projection is a way of computing the projection of a vector onto an orthonormal basis.

In mathematics, an orthonormal basis is a special type of basis for a vector space. It consists of mutually orthogonal unit vectors that are also normalized, meaning that they have a length of 1. An orthogonal projection is a transformation of a vector onto another vector that is perpendicular to it. In other words, it is the process of projecting one vector onto another vector that is at right angles to it. Orthonormal bases and orthogonal projections are important in many areas of mathematics, including linear algebra and geometry.

Together, these concepts allow us to decompose any vector into a series of simple, linearly independent components. This can be helpful for many purposes, including data compression, error correction, and signal processing. In this book, we will discuss these concepts in more detail and show how they can be used in practice.

Orthonormal bases and orthogonal projections are closely related concepts, and understanding both can be helpful in a variety of contexts.

The Gram-Schmidt process is a method for orthogonalizing a set of vectors. It is commonly used in numerical analysis and in particular, in the QR factorization of a matrix.

The Gram-Schmidt process begins with a set of linearly independent vectors \(u_1,u_2,...,u_n\) and produces a set of orthonormal vectors \(v_1,v_2,...,v_n\) that span the same space as the original vectors.

The \(v\)’s are constructed iteratively as follows: For \(k = 1,2,...,n\), let \(v_k\) be the vector \(u_k - \text{proj}(v_1) - \text{proj}(v_2) - ... - \text{proj}(v_k)-1\) where \(\text{proj}(v_j)\) is the projection of \(u_k\) onto \(v_j\).

Once the v’s have been computed, they can be used to obtain the QR factorization of any matrix \(A\) as follows: \(A = QR\) where \(Q\) is an \(n×n\) matrix whose columns are the \(v\)’s and \(R\) is an upper triangular matrix.

The Gram-Schmidt process is sometimes also called the QR algorithm. The name comes from its use in computing the QR factorization of a matrix. However, it should be noted that the Gram-Schmidt process can be used to orthogonalize any set of vectors, not just those that arise from a matrix.

In general, the Gram-Schmidt process is a reliable QR factorization method when used with care.

In mathematics, an orthogonal transformation is a linear transformation that preserves the inner product of vectors. In other words, it preserves angles between vectors. If a transformation is both orthogonal and preserves length, then it is also an isometry. Orthogonal transformations are also known as rigid motions or rotations.

An orthogonal matrix is a square matrix whose columns and rows are orthogonal unit vectors. That is, the transpose of the matrix is equal to its inverse. Every orthogonal matrix has determinant +1 or \(-1\), since the determinant of a matrix is equal to the product of its eigenvalues.

Orthogonal transformations arise naturally in many areas of mathematics and physics. For example, they are used to define rotations in space, and they can be used to diagonalize Hermitian matrices. In addition, they are used in many numerical algorithms, such as the QR factorization algorithm for solving linear least squares problems.

Orthogonal matrices have many interesting properties, including the fact that their eigenvectors are always orthogonal to one another. This makes them particularly useful for many applications involving signal processing and image compression.

An inner-product is a mathematical operation that takes two vectors and produces a scalar result. The most common type of inner-product is the dot-product, which simply multiplies the corresponding components of the two vectors and sums the results.

The dot-product can be used to calculate the magnitude of a vector, as well as the angle between two vectors. More generally, an inner-product can be any symmetric bilinear map from a vector space to its underlying field.

Inner-products are often used in machine learning and statistics, where they can be used to measure similarity between data points.

In physics, inner-products are used to define Hermitian operators, which are important in quantum mechanics.

Orthogonality is a powerful mathematical tool that has many applications in physics, engineering, and numerical analysis. In this book, you’ll learn how the Gram-Schmidt process can be used to orthogonalize a set of vectors, and how orthogonal matrices can be used to define rotations in space. We have also seen how inner-products can be used to measure similarity between vectors, and to define important operators in physics. With this book as your guide, you should now have a good understanding of the basics of orthogonality and its many uses.

## 4.1 Orthonormal Bases and Orthogonal Projections

The **norm** of a vector \(\vec v\) in \(\mathbb{R}^n\) is \(\norm{\vec v}=\sqrt{\vec v \cdot \vec v}\). A vector \(\vec u\) in \(\mathbb{R}^n\) is called a **unit vector** if \(\norm{\vec u}=1\).

**Example 4.1 **If \(\vec v\in \mathbb{R}^n\) and \(k\) is a scalar, then \(\norm{k \vec v}=|k| \norm{\vec v}\), and if \(v\) is nonzero then \(\vec u=\frac{1}{\norm{\vec v}} \vec v\) is a unit vector. Since \(\norm{k \vec v}^2=(k \vec v)\cdot (k \vec v)=k^2(\vec v \cdot \vec v)=k^2\norm{\vec v}^2\), taking square roots provides \(\norm{k \vec v}=|k| \norm{\vec v}\). If \(v\) is nonzero, then \(\frac{1}{\norm{\vec v}}\) is defined and so \(\norm{\vec u}= \frac{1}{\norm{\vec v}} \norm{\vec v}=1\) which follows by the first part.

Two vectors \(\vec v\) and \(\vec w\) in \(\mathbb{R}^n\) are called **perpendicular** or **orthogonal** if \(\vec v \cdot \vec w=0\). The vectors \(\vec u_1,\ldots,\vec u_m\) in \(\mathbb{R}^n\) are called **orthonormal** if they are all unit vectors and orthogonal to one another. A basis of \(\mathbb{R}^n\) consisting only of orthonormal vectors is called an **orthonormal basis**.

**Theorem 4.1 (Orthonormal Vectors) **Orthonormal vectors are linearly independent, and thus orthonormal vectors \(\vec u_1,\ldots,\vec u_n\in \mathbb{R}^n\) form a basis for \(\mathbb{R}^n\).

*Proof*. Suppose \((\vec u_1,\ldots,\vec u_m)\) are orthonormal vectors in \(\mathbb{R}^n\). To show linear independence suppose,

\[
c_1 \vec u_1+\cdots + c_m \vec u_m=\vec 0
\] for some scalars \(c_1,\ldots,c_m\) in \(\mathbb{R}\). Applying the dot product with \(\vec u_i\), \[
\left ( c_1 \vec u_1 + \cdots +c_m \vec u_m\right ) \cdot \vec u_i =\vec 0 \cdot \vec u_i=0.
\] Because the dot product is distributive, \(c_1(\vec u_1 \cdot \vec u_i )+\cdots + c_m (\vec u_m \cdot \vec u_i)=0\). We know that \(\vec u_i\cdot \vec u_i=1\) and all other dot products are zero. Therefore, \(c_i=0\). Since this holds for all \(i=1,\ldots,m\), it follows that \(c_1=\cdots =c_m=0\), and therefore, \((\vec u_1,\ldots,\vec u_m)\) are linearly independent. The second part follows since \(n\) linearly independent vectors in \(\mathbb{R}^n\) always forms a basis.

**Example 4.2 **Find three examples of an orthonormal basis for a subspace they span. The vectors \(\vec e_1,\ldots, \vec e_m\) in \(\mathbb{R}^n\) form an orthonormal basis of the subspace they span. For any scalar \(\theta\), the vectors \(\vectortwo{\cos \theta}{\sin \theta}\), \(\vectortwo{-\sin\theta}{\cos \theta}\) form an orthonormal basis of \(\mathbb{R}^2\). The vectors \[
\begin{array}{ccc}
\vec u_1=\vectorfour{1/2}{1/2}{1/2}{1/2}, &
\vec u_2=\vectorfour{1/2}{1/2}{-1/2}{-1/2}, &
\vec u_3=\vectorfour{1/2}{-1/2}{1/2}{-1/2}
\end{array}
\] in \(\mathbb{R}^4\) form an orthonormal basis of the subspace they span.

Let \(V\) be a subspace of \(\mathbb{R}^n\). The **orthogonal complement** \(V^\perp\) of \(V\) is the set of those vectors \(\vec x\) in \(\mathbb{R}^n\) that are orthogonal to all vectors in \(V\) namely, \[\begin{equation}
V^{\perp}=\{ \vec x \in \mathbb{R}^n \mid \vec v \cdot \vec x =0, \text{ for all } \vec v \text{ in } V\}.
\end{equation}\] It is easy to verify that \(V^\perp\) is always a subspace and that \((\mathbb{R}^n)^\perp=\{0\}\) and \(\{0\}^\perp=\mathbb{R}^n\). Also notice that if \(U_1\subseteq U_2\) then \(U_2^\perp \subseteq U_1^\perp\). If \(\vec x\in V^{\perp}\) then \(\vec x\) is said to be **perpendicular** to \(V\). The vector \(\vec x^{\parallel}\) in the following theorem is called the **orthogonal projection** of \(\vec x\) on a subspace \(V\) of \(\mathbb{R}^n\) and is denoted by \(\text{proj}_V (\vec x)\).

**Theorem 4.2 (Orthogonal Projection) **

- If \(V\) is a subspace of \(\mathbb{R}^n\) and \(\vec x\in \mathbb{R}^n\), then \(\vec x=\vec x^\parallel + \vec x^\perp\) where \(\vec x^\perp\) is perpendicular to \(V\), and this representation is unique.
- If \(V\) is a subspace of \(\mathbb{R}^n\) with an orthonormal basis \(\vec u_1 ,\ldots, \vec u_m\), then \[\begin{equation} \mathrm{proj}_V (\vec x):=\vec x^\parallel = (\vec u_1 \cdot \vec x) \vec u_1 + \cdots + (\vec u_m \cdot \vec x) \vec u_m \end{equation}\] for all \(\vec x\) in \(\mathbb{R}^n\).
- Let \(\vec u_1 ,\ldots, \vec u_n\) be an orthonormal basis in \(\mathbb{R}^n\), then \[\begin{equation} \vec x = (\vec u_1 \cdot \vec x) \vec u_1 + \cdots + (\vec u_n \cdot \vec x) \vec u_n \end{equation}\] for all \(\vec x\) in \(\mathbb{R}^n\).

*Proof*. The proof is left for the reader.

**Example 4.3 **Find the orthogonal projection of \(\vectorthree{49}{49}{49}\) onto the subspace of \(\mathbb{R}^3\) spanned by \(\vectorthree{2}{3}{6}\) and \(\vectorthree{3}{-6}{2}\). The two given vectors spanning the subspace are orthogonal since \(2(3)+3(-6)+6(2)=0\), but they are not unit vectors since both have length 7. To obtain an orthonormal basis \(\vec u_1, \vec u_2\) of the subspace, we divide by 7: \(\vec u_1=\frac{1}{7}\vectorthree{2}{3}{6}\) and \(\vec u_2=\frac{1}{7}\vectorthree{3}{-6}{2}\). Now we can use \(\eqref{Orthogonal Projection}\) with \(\vec x=\vectorthree{49}{49}{49}\). Then \[
\text{proj}_V(\vec x)=(\vec u_1 \cdot \vec x)\vec u_1+(u_2\cdot \vec x) \vec u_2=
11 \vectorthree{2}{3}{6}+(-1)\vectorthree{3}{-6}{2}= \vectorthree{19}{39}{64}.
\]

**Example 4.4 **Find the coordinates of the vector \(\vec x=\vectorfour{4}{5}{6}{7}\) with respect to the orthonormal basis \[
\begin{array}{cccc}
\vec u_1=\vectorfour{1/2}{1/2}{1/2}{1/2}, &
\vec u_2=\vectorfour{1/2}{1/2}{-1/2}{-1/2}, &
\vec u_3=\vectorfour{1/2}{-1/2}{1/2}{-1/2}, &
\vec u_4=\vectorfour{1/2}{-1/2}{-1/2}{1/2}.
\end{array}
\] Normally to find the coordinates of \(\vec x\) we would solve the system \[
\vectorfour{4}{5}{6}{7}=
c_1 \vectorfour{1/2}{1/2}{1/2}{1/2}+
c_2 \vectorfour{1/2}{1/2}{-1/2}{-1/2}+
c_3 \vectorfour{1/2}{-1/2}{1/2}{-1/2} +
c_4 \vectorfour{1/2}{-1/2}{-1/2}{1/2}
\] for \(c_1, c_2, c_3, c_4\). However we can use \(\eqref{Orthogonal Projection}\) instead: \[
c_1=u_1\cdot \vec x=\vectorfour{1/2}{1/2}{1/2}{1/2} \cdot \vectorfour{4}{5}{6}{7}=11,
\] \[
c_2=u_2\cdot \vec x=\vectorfour{1/2}{1/2}{-1/2}{-1/2} \cdot \vectorfour{4}{5}{6}{7}=-2,
\] \[
c_3=u_3\cdot \vec x=\vectorfour{1/2}{-1/2}{1/2}{-1/2} \cdot \vectorfour{4}{5}{6}{7}=-1,
\] \[
c_4=u_4\cdot \vec x=\vectorfour{1/2}{-1/2}{-1/2}{1/2} \cdot \vectorfour{4}{5}{6}{7}=0.
\] Therefore the \(\mathcal{B}\)-coordinate vector of \(\vec x\) is \(\vectorfour{11}{-2}{-1}{0}\).

**Theorem 4.3 (Properties of the Orthogonal Complement) **Let \(V\) be a subspace of \(\mathbb{R}^n\), then

- \(\mathrm{proj}_V(\vec x)\) is a linear transformation \(\mathbb{R}^n\to V\) with kernel \(V^\perp\),
- \(V \cap V^\perp = \{\vec 0\}\),
- \(\text{dim} V +\text{dim} V^\perp=n\), and
- \((V^\perp)^\perp = V\).

*Proof*. The proof of each part follows.

- To prove the linearity of \(T(x):=\mathrm{proj}_V(\vec x)\) we will use the definition of a projection: \(T(\vec x)\) is in \(V\), and \(\vec x-T(\vec x)\) is in \(V^\perp\). To show \(T(\vec x+\vec y)=T(\vec x)+T(\vec y)\), notice \(T(\vec x)+T(\vec y)\) is in \(V\) (since \(V\) is a subspace), and \(\vec x+\vec y-(T(\vec x)+T(\vec y))=(\vec x-T(\vec x))+(\vec y-T(\vec y))\) is in \(V^\perp\) (since \(V^\perp\) is a subspace). To show that \(T(k \vec x)=k T(\vec x)\), note that \(k T(\vec x)\) is in \(V\) (since \(V\) is a subspace) and \(k \vec x-k T(\vec x)=k(\vec x-T(\vec x))\) is in \(V^\perp\) (since \(V^\perp\) is a subspace).
- Since \(\{\vec 0\} \subseteq V\) and \(\{\vec 0\} \subseteq V^\perp\), \(\{\vec 0\} \subseteq V\cap V^\perp.\) If a vector \(\vec x\) is in \(V\) as well as in \(V^\perp\), then \(\vec x\) is orthogonal to itself: \(\vec x \cdot \vec x =\norm{x}^2=0\), so that \(\vec x\) must equal \(\vec 0\) which shows \(V \subseteq \{\vec 0\}\). Therefore, \(V\cap V^\perp=\{\vec 0\}\).
- Apply the Rank-Nullity Theorem to the linear transformation \(T(\vec x)=\text{proj}_V(\vec x)\) yielding \(n=\text{dim} \mathbb{R}^n =\text{dim} \text{image} T+\text{dim} \ker T=\text{dim} V+\text{dim} V^\perp\).
- Let \(\vec v\in V\). Then \(\vec v\cdot \vec x=0\) for all \(\vec x\) in \(V^\perp\). Since \((V^\perp)^\perp\) contains all vectors \(\vec y\) such that \(\vec y \cdot \vec x =0\), \(\vec v\) is in \((V^\perp)^\perp\). So \(V\) is a subspace of \((V^\perp)^\perp\). Using (iii) with \(V\) (\(n=\text{dim} V+\text{dim} V^\perp\)) and again with \(V^\perp\) (\(\text{dim} V^\perp+\text{dim} (V^\perp)^\perp\)) yielding \(\text{dim} V=\text{dim} (V^\perp)^\perp\); and since \(V\) is a subspace of \((V^\perp)^\perp\) it follows that \(V=(V^\perp)^\perp\).

**Example 4.5 **Find a basis for \(W^\perp\), where \(W=\text{span} \left( \vectorfour{1}{2}{3}{4}, \vectorfour{5}{6}{7}{8} \right)\). The orthogonal complement \(W^\perp\) of \(W\) consists of the vectors \(\vec x\) in \(\mathbb{R}^4\) such that \[
\vectorfour{x_1}{x_2}{x_3}{x_4} \cdot \vectorfour{1}{2}{3}{4}
=0 \hspace{1cm}\text{and}
\vectorfour{x_1}{x_2}{x_3}{x_4} \cdot \vectorfour{5}{6}{7}{8}=0.
\] Finding these vectors amounts to solving the system \[
\begin{cases}
x_1+2x_2+3x_3+4x_4&=0
\\ 5x_1+6x_2+7x_3+8x_4 & =0
\end{cases}
\] The solutions are of the form \[
\vectorfour{x_1}{x_2}{x_3}{x_4}=\vectorfour{s+2t}{-2s-3t}{s}{t}=s\vectorfour{1}{-2}{1}{0}+t\vectorfour{2}{-3}{0}{1}.
\] The two vectors on the right form a basis of \(W^{\perp}\).

**Theorem 4.4 **Let \(\vec x, \vec y \in \mathbb{R}^n\). Then

- (
**Pythagorean Theorem**) \[ ||\vec x + \vec y || ^2 = || \vec x|| ^2 + || \vec y || ^2 \] holds if and only if \(\vec x \perp \vec y\), - (
**Cauchy-Schwarz**) \[ | \vec x \cdot \vec y | \leq || \vec x || \, || \vec y || \] where equality holds if and only if \(\vec x\) and \(\vec y\) are parallel, - (
**Law of Cosines**) the angle \(\theta\) between \(\vec x\) and \(\vec y\) is defined as \[ \theta = \arccos \frac{\vec x \cdot \vec y}{|| \vec x || \, || \vec y||}, \] - (
**Triangular Inequality**) \[ \norm{\vec x+\vec y}\leq \norm{\vec x}+\norm{\vec y}. \]

*Proof*. The proof of each part follows.

- The verification is straightforward: \[\begin{align*} \norm{\vec x+\vec y }^2 & =(\vec x + \vec y)\cdot (\vec x + \vec y) =\vec x \cdot \vec x+2(\vec x \cdot \vec y) +\vec y \cdot \vec y \\ & =\norm{\vec x}^2+2(\vec x\cdot \vec y)+\norm{\vec y}^2 =\norm{\vec x}^2+\norm{\vec y}^2 \end{align*}\] where the last equality holds if and only if \(\vec x\cdot \vec y=0\).
- Let \(V\) be a one-dimensional subspace of \(\mathbb{R}^n\) spanned by a nonzero vector \(\vec y\). Let \(\vec u=\frac{1}{\norm{\vec y}} \vec y\). Then \[ \norm{\vec x} \geq \norm{\text{proj}_V(\vec x)} =\norm{(\vec x\cdot \vec u)\vec u} =|\vec x\cdot \vec u| =\left| \vec x \cdot \left( \frac{1}{\norm{y}}\vec y \right)\right| = \frac{1}{\norm{y}}|\vec x \cdot \vec y| \] multiplying by \(\norm{\vec y}\), yields \(| \vec x \cdot \vec y | \leq || \vec x || \, || \vec y ||\). Notice that \(\norm{\vec x} \geq \norm{\text{proj}_V(\vec x)}\) holds by applying the Pythagorean theorem to \(\vec x=\vec x^\parallel +\vec x^\perp\) with \(\vec x^\perp \cdot \vec x^\parallel =0\) so that \(\norm{\vec x}^2=\norm{\text{proj}_V(\vec x)}^2+\norm{\vec x^\perp}^2\) which leads to \(\norm{\text{proj}_V \vec x} \leq \norm{\vec x}\).
- We have to make sure that \(\eqref{law-of-cosines}\) is defined, that is \(\theta\) is between \(-1\) and \(1\), or equivalently, \[ \left | \frac{\vec x \cdot \vec y }{\norm{x} \, \norm{y}} \right | \leq 1. \] But this follows from the Cauchy-Schwarz inequality.
- Using the Cauchy-Schwarz inequality, the verification is straightforward, \[ \norm{\vec x+\vec y}^2=(\vec x + \vec y)\cdot (\vec x + \vec y)=\norm{\vec x}^2+\norm{\vec y}^2+2(\vec x\cdot \vec y) \] \[ \leq \norm{\vec x}^2+\norm{\vec y}^2+2\norm{\vec x}\norm{\vec y}=(\norm{\vec x}+\norm{\vec y})^2 \] Taking the square root of both sides yields \(\norm{\vec x+\vec y}\leq \norm{\vec x}+\norm{\vec y}\).

**Example 4.6 **Determine whether the angle between the vectors \(\vec u=\vectorthree{2}{3}{4}\), \(\vec v=\vectorthree{2}{-8}{5}\) is a right angle using the Pythagorean Theorem. Since \(\norm{\vec u}=\sqrt{2^2+3^2+4^2}=\sqrt{29}\) and \(\norm{\vec v}=\sqrt{2^2+(-8)^2+5^2}=\sqrt{93}\). Then \[
\norm{\vec u+\vec v}^2=\left|\left| \, \, \vectorthree{4}{-5}{9} \, \,
\right| \right|^2 =122 = 29+93= || \vec u|| ^2 + || \vec v || ^2
\] shows \(\vec u \perp \vec v\).

**Example 4.7 **Consider the vectors \(\vec u =\vectorfour{1}{1}{\vdots}{1}\) and \(\vec v=\vectorfour{1}{0}{\vdots}{0}\) in \(\mathbb{R}^n\). For \(n=2,3,4\), find the angle \(\theta\) between \(\vec u\) and \(\vec v\). Then find the limit of \(\theta\) as \(n\) approaches infinity. For any possible value of \(n\), \[
\theta_n=\arccos \frac{\vec u \cdot \vec v}{\norm{\vec u}\norm{\vec v}}=\arccos \frac{1}{\sqrt{n}}.
\] Then \[
\begin{bmatrix}
\theta_2=\arccos \frac{1}{\sqrt{2}}=\frac{\pi}{4}, &
\hspace{.5cm} \theta_3=\arccos \frac{1}{\sqrt{3}}=\frac{\pi}{4}\sim 0.955 \text{ rads}, &
\hspace{.5cm} \theta_4=\arccos \frac{1}{\sqrt{4}}=\frac{\pi}{3}.
\end{bmatrix}
\] Since \(y=\arccos(x)\) is a continuous function, \[
\lim_{x\mapsto \infty} \theta_n
= \lim_{x\mapsto \infty} \arccos\left( \frac{1}{\sqrt{n}} \right)
= \arccos \left( \lim_{x\mapsto \infty} \frac{1}{\sqrt{n}} \right)
= \arccos(0)
=\frac{\pi}{2}.
\]

## 4.2 Gram-Schmidt Process and QR Factorization

The Gram-Schmidt process represents a change of basis from a basis \(\mathcal{B}=(\vec v_1, \vec v_2, ..,,\vec v_m)\) of a subspace \(V\) of \(\mathbb{R}^n\) to an orthonormal basis \(\mathcal{U}=(\vec u_1, \vec u_2, \ldots,\vec u_m)\) of \(V\); it is most sufficiently described in terms of the change of basis matrix \(R\) from \(\mathcal{B}\) to \(\mathcal{U}\) via, \[\begin{equation} M:=\begin{bmatrix} \vec v_1 & \vec v_2 & \cdots & \vec v_m \end{bmatrix} =\begin{bmatrix} \vec u_1 &\vec u_2 & \cdots & \vec u_m \end{bmatrix}R=:QR \end{equation}\]

The \(QR\) factorization is an effective way to organize and record the work performed in the Gram-Schmidt process; it is also useful for many computational and theoretical purposes.

**Theorem 4.5 (Gram-Schmidt Process) **Let \(\vec v_1, \vec v_2, \ldots, \vec v_m\) be a basis of a subspace \(V\) of \(\mathbb{R}^n\). Then \[\begin{equation}
\begin{array}{cccc}
\vec u_1 = \frac{1}{||\vec v_1 || } \vec v_1, &
\vec u_2 = \frac{1}{||\vec v_2^\perp || } \vec v_2^\perp, & \ldots, &
\vec u_m = \frac{1}{||\vec v_m^\perp || } \vec v_m^\perp
\end{array}
\end{equation}\] is an orthonormal basis of \(V\) where \[\begin{equation}
\vec v_j ^\perp
= \vec v_j - \vec v_j^\parallel
= \vec v_j - (\vec u_1 \cdot \vec v_j) \vec u_1 - (\vec u_2 \cdot \vec v_j) \vec u_2 - \cdots - (\vec u_{j-1} \cdot \vec v_j) \vec u_{j-1}.
\end{equation}\]

*Proof*. For each \(j\), we resolve the vectors \(\vec v_j\) into its components parallel and perpendicular to the span of the preceding vectors \(\vec v_1, \vec v_2, \ldots, \vec v_{j-1}\): \[\vec v_j = \vec v_j^\parallel + \vec v_j^\perp \hspace{1cm} \text{with respect to } \text{span}(\vec v_1, \vec v_2, \ldots, \vec v_{j-1}).\] Then use \(\ref{Orthogonal Projection}\).

**Example 4.8 **Perform the Gram-Schmidt process on the vectors \[
\vec v_1=\vectorthree{4}{0}{3}, \qquad
\vec v_2=\vectorthree{25}{0}{-25}, \qquad
\vec v_3=\vectorthree{0}{-2}{0}.
\] By Theorem \(\eqref{Gram-Schmidt Process}\), we determine \(\vec u_1, \vec u_2, \vec u_3\) as follows: \[
\vec u_1= \vectorthree{4/5}{0}{3/5},
\hspace{1cm}
\vec u_2=\frac{\vec v_2^\perp}{\norm{\vec v_2^\perp}}
=\frac{\vec v_2-\left(\vec u_1 \cdot \vec v_2\right) \vec u_1}{\norm{\vec v_2-\left(\vec u_1 \cdot \vec v_2\right)\vec u_1}}
=\vectorthree{3/5}{0}{-4/5},
\] and \[
\vec u_3=\frac{v_3^\perp}{\norm{v_3^\perp}}
=\frac{\vec v_3-\left(\vec u_1 \cdot \vec v_3\right) \vec u_1-\left(\vec u_2 \cdot \vec v_3\right) \vec u_2}{\norm{\vec v_3-\left(\vec u_1 \cdot \vec v_3\right) \vec u_1-\left(\vec u_2 \cdot \vec v_3\right) \vec u_2}}
=\vectorthree{0}{-1}{0}.
\] Therefore \[
\left( \vectorthree{4/5}{0}{3/5}, \vectorthree{3/5}{0}{-4/5}, \vectorthree{0}{-1}{0}\right).
\] is an orthonormal basis for \(\text{span} (\vec v_1, \vec v_2, \vec v_3)\).

::: {#thm- } [QR Factorization] Let \(M\) be an \(n \times m\) matrix with linearly independent columns \(\vec v_1 , \vec v_2, \ldots,\vec v_m\). Then there exists an \(n\times m\) matrix \(Q\) whose columns \(\vec u_1, \vec u_2, \ldots, \vec u_m\) are orthonormal and an upper triangular matrix \(R\) with positive diagonal entries such that \(M=Q R\); and this representation is unique. Furthermore, the entries \(r_{ij}\) of \(R\) are given by

- \(r_{11}=\norm{\vec v_1}\),
- \(r_{jj}=||\vec v_j^\perp||\) (for \(j=2,\ldots,m\)), and
- \(r_{i j}=\vec u_i \cdot \vec v_j\) (for \(i<j\)). :::

*Proof*. The proof is left for the reader.

**Example 4.9 **Find the \(QR\) factorization of the matrix and display the commutative diagram. \[
M=\begin{bmatrix} 4 & 25 & 0 \\ 0 & 0 & -2 \\ 3 & -25 & 0\end{bmatrix}
\] Let \[
\begin{array}{ccc}
\vec v_1=\vectorthree{4}{0}{3}, &
\vec v_2=\vectorthree{25}{0}{-25}, &
\vec v_3=\vectorthree{0}{-2}{0}
\end{array}
\] then (as determined above) an orthonormal basis for the column vectors of \(M\) is

\[\left(\vec u_1, \vec u_2, \vec u_3\right)=\left( \vectorthree{4/5}{0}{3/5}, \vectorthree{3/5}{0}{-4/5}, \vectorthree{0}{-1}{0}\right).\] Determining the entries of \(R\) (also as determined above): \[
\begin{array}{ccc}
r_{11}=\norm{\vec v_1}=5, &
r_{22}=\norm{\vec v_2^\perp}=35, &
r_{33}=\norm{\vec v_3^\perp}=2
\end{array}
\] \[
\begin{array}{ccc}
r_{12}=\vec u_1 \cdot \vec v_2=5, &
r_{13}=\vec u_1 \cdot \vec v_3=0, &
r_{23}=\vec u_2 \cdot \vec v_3=0
\end{array}
\] and therefore, the \(QR\)-factorization of \(M\) is \[
M= \begin{bmatrix} 4 & 25 & 0 \\ 0 & 0 & -2 \\ 3 & -25 & 0\end{bmatrix}
= \begin{bmatrix}4/5 & 3/5 & 0 \\ 0 & 0 & -1 \\ 3/5 & -4/5 & 0 \end{bmatrix}
\begin{bmatrix}5 & 5 & 0 \\ 0 & 35 & 0 \\ 0 & 0 & 2\end{bmatrix}
=QR.
\]

**Example 4.10 **

- Perform the Gram-Schmidt process on the vectors \[ \vec v_1=\vectorfour{1}{7}{1}{7}, \qquad \vec v_2= \vectorfour{0}{7}{2}{7}, \qquad \vec v_3 = \vectorfour{1}{8}{1}{6}. \]
- Find the \(QR\) factorization of the matrix \(M=\begin{bmatrix} \vec v_1 & \vec v_2 & \vec v_3\end{bmatrix}.\)

By Theorem \(\eqref{Gram-Schmidt Process}\), we determine \(\vec u_1, \vec u_2, \vec u_3\) as follows: \[ \vec u_1= \vectorfour{1/10}{7/10}{1/10}{7/10}, \vec u_2=\frac{\vec v_2^\perp}{\norm{\vec v_2^\perp}} =\frac{\vec v_2-\left(\vec u_1 \cdot \vec v_2\right) \vec u_1}{\norm{\vec v_2-\left(\vec u_1 \cdot \vec v_2\right)\vec u_1}} =\vectorfour{-1/\sqrt{2}}{0}{1/\sqrt{2}}{0}, \] and \[ \vec u_3=\frac{\vec v_3^\perp}{\norm{\vec v_3^\perp}} =\frac{\vec v_3-\left(\vec u_1 \cdot \vec v_3\right) \vec u_1-\left(\vec u_2 \cdot \vec v_3\right) \vec u_2}{\norm{\vec v_3-\left(\vec u_1 \cdot \vec v_3\right) \vec u_1-\left(\vec u_2 \cdot \vec v_3\right) \vec u_2}} =\vectorfour{0}{1/\sqrt{2}}{0}{-1/\sqrt{2}}. \] Therefore an orthonormal basis for \(\text{span} (\vec v_1, \vec v_2, \vec v_3)\) is \[ \left( \vectorfour{1/10}{7/10}{1/10}{7/10}, \vectorfour{-1/\sqrt{2}}{0}{1/\sqrt{2}}{0}, \vectorfour{0}{1/\sqrt{2}}{0}{-1/\sqrt{2}}\right). \] Determining the entries of \(R\) (also as determined above): \[ \begin{array}{ccc} r_{11}=\norm{\vec v_1}=10, & r_{22}=\norm{\vec v_2^\perp}=\sqrt{2}, & r_{33}=\norm{\vec v_3^\perp}=\sqrt{2} \end{array} \] \[ \begin{array}{ccc} r_{12}=\vec u_1 \cdot \vec v_2=10, & r_{13}=\vec u_1 \cdot \vec v_3=10, & r_{23}=\vec u_2 \cdot \vec v_3=0 \end{array} \] and therefore, the \(QR\)-factorization of \(M\) is \[ M=\begin{bmatrix} 1 & 0 & 1 \\ 7 & 7 & 8 \\ 1 & 2 & 1 \\ 7 & 7 & 6 \end{bmatrix} = \begin{bmatrix}1/10 & -1/\sqrt{2} & 0 \\ 7/10 & 0 & 1/\sqrt{2} \\ 1/10 & 1/\sqrt{2} & 0 \\ 7/10 & 0 & -1/\sqrt{2}\end{bmatrix} \begin{bmatrix}10 & 10 & 10 \\ 0 & \sqrt{2} & 0 \\ 0 & 0 & \sqrt{2}\end{bmatrix} =QR. \]

## 4.3 Orthogonal Transformations and Orthogonal Matrices

A linear transformation \(T\) from \(\mathbb{R}^n\) to \(\mathbb{R}^n\) is called an **orthogonal transformation** if it preserves the length of vectors: \(\norm{T(\vec x)}=\norm{x}\) for all \(\vec x\in \mathbb{R}^n.\) If \(T(\vec x)=A\vec x\) is an orthogonal transformation, we say \(A\) is an **orthogonal matrix** .

**Lemma 4.1 **Let \(T\) be an orthogonal transformation from \(\mathbb{R}^n\) to \(\mathbb{R}^n\). If \(\vec v, \vec w \in \mathbb{R}^n\) are orthogonal, then \(T(\vec v), T(\vec w) \in \mathbb{R}^n\) are orthogonal.

*Proof*. We want to show \(T(\vec v), T(\vec w)\) are orthogonal, and by the Pythagorean theorem, we have to show \[
\norm{T(\vec v)+T(\vec w)}^2=\norm{T(\vec v)}^2+\norm{T(\vec w)}^2.
\] This equality follows \[\begin{align*}
\norm{T(\vec v)+T(\vec w)}^2
& =\norm{T(\vec v+\vec w)}^2
=\norm{\vec v+\vec w}^2 \\
& =\norm{\vec v}^2+\norm{\vec w}^2
=\norm{T(\vec v)}^2+\norm{T(\vec w)}^2
\end{align*}\] since \(T\) is linear, orthogonal and that \(\vec v, \vec w\) are orthogonal, respectively.

**Theorem 4.6 **A linear transformation \(T\) from \(\mathbb{R}^n\) to \(\mathbb{R}^n\) is orthogonal if and only if the vectors \(T(\vec e_1)\), , \(T(\vec e_n)\) form an orthonormal basis.

*Proof*. If \(T\) is an orthogonal transformation, then by definition, the \(T(\vec e_i)\) are unit vectors, and also, by \(\ref{orthogonal transformation}\) they are orthogonal. Therefore, \(T(\vec e_1)\), , \(T(\vec e_n)\) form an orthonormal basis.

Conversely, suppose \(T(\vec e_1)\), , \(T(\vec e_n)\) form an orthonormal basis. Consider a vector \(\vec x=x_1 \vec e_1+\cdots +x_n \vec e_n\). Then \[\begin{align*} \norm{T(\vec x)}^2 &=\norm{T(x_1 \vec e_1+\cdots + x_n \vec e_n)}^2 =\norm{x_1 T(\vec e_1)+\cdots + x_n T(\vec e_n)}^2 \\ &=\norm{x_1T(\vec e_1)}^2+\cdots + \norm{x_nT(\vec e_n)}^2 = x_1^2+\cdots + x_n^2 =\norm{x}^2. \end{align*}\] Taking the square root of both sides shows that \(T\) preserves lengths and therefore, \(T\) is an orthogonal transformation.

**Corollary 4.1 **An \(n \times n\) matrix \(A\) is orthogonal if and only if its columns form an orthonormal basis.

*Proof*. The proof is left for the reader.

The **transpose** \(A^T\) of an \(n\times n\) matrix \(A\) is the \(n\times n\) matrix whose \(ij\)-th entry is the \(ji\)-th entry of \(A\). We say that a square matrix \(A\) is **symmetric** if \(A^T=A\), and \(A\) is called **skew-symmetric** if \(A^T=-A\).

**Theorem 4.7 (Orthogonal and Transpose Properties) **

- The product of two orthogonal \(n\times n\) matrices is orthogonal.
- The inverse of an orthogonal matrix is orthogonal.
- If the products \((A B)^T\) and \(B^T A^T\) are defined then they are equal.
- If \(A\) is invertible then so is \(A^T\), and \((A^T)^{-1}=(A^{-1})^T\).
- For any matrix \(A\), \(\text{rank}\,(A) = \text{rank} \,(A^T)\).
- If \(\vec v\) and \(\vec w\) are two column vectors in \(\mathbb{R}^n\), then \(\vec v \cdot \vec w = \vec v^T \vec w\).
- The \(n \times n\) matrix \(A\) is orthogonal if and only if \(A^{-1}=A^T\).

*Proof*. The proof of each part follows.

- Suppose \(A\) and \(B\) are orthogonal matrices, then \(AB\) is an orthogonal matrix since \(T(\vec x)=AB \vec x\) preserves length because \(\norm{T(\vec x)}=\norm{AB \vec x}=\norm{A(B \vec x)}=\norm{B \vec x}=\norm{\vec x}.\)
- Suppose \(A\) is an orthogonal matrix, then \(A^{-1}\) is orthogonal an matrix since \(T(\vec x)=A^{-1} \vec x\) preserves length because \(\norm{A^{-1}\vec x}=\norm{A(A^{-1}\vec x)}=\norm{\vec x}\).
- We will compare entries in the matrices \((AB)^T\) and \(B^T A^T\) as follows: \[ \begin{array}{rl} i j \text{-th entry of }(AB)^T &= ji \text{-th entry of }AB\\ & = (j \text{-th row of } A) \cdot (i \text{-th column of } B)\\ \\ i j \text{-th entry of }B^TA^T &=(i \text{-th row of } B^T) \cdot (j \text{th column of } A^T)\\ & = (i \text{-th column of } B) \cdot (j \text{-th row of } A)\\ & = (j \text{-th row of } A) \cdot (i \text{-th column of } B). \end{array} \] Therefore, the \(ij\)-th entry of \((AB)^T\) is the same of the \(ij\)-th entry of \(B^T A^T\).
- Suppose \(A\) is invertible, then \(A A^{-1}=I_n\). Taking the transpose of both sides along with (iii) it yields, \((A A^{-1})^T=(A^{-1})^T A^T=I_n\). Thus \(A^T\) is invertible and since inverses are unique, it follows \((A^T)^{-1}=(A^{-1})^T\).
- Exercise.
- If \(\vec v=\vectorthree{a_1}{\vdots}{a_n}\) and \(\vec w=\vectorthree{b_1}{\vdots}{b_n}\), then \[ \vec v \cdot \vec w=\vectorthree{a_1}{\vdots}{a_n}\cdot \vectorthree{b_1}{\vdots}{b_n}=a_1b_1+\cdots +a_n b_n =\begin{bmatrix}a_1 & \cdots & a_n\end{bmatrix} \vectorthree{b_1}{\vdots}{b_n} =\vectorthree{a_1}{\vdots}{a_n}^T \vec w=\vec v^T \vec w. \]
- Let’s write \(A\) in terms of its columns: \(A=\begin{bmatrix} \vec v_1 & \cdots & \vec v_n \end{bmatrix}\). Then \[\begin{equation} \label{ata} A^T A= \begin{bmatrix} \vec v_1^T \\ \vdots \\ \vec v_n^T \end{bmatrix} \begin{bmatrix} \vec v_1 & \cdots & \vec v_n \end{bmatrix} =\begin{bmatrix}\vec v_1 \cdot \vec v_1 & & \vec v_1 \cdot \vec v_n \\ \vdots & \cdots & \vdots \\ \vec v_n \cdot \vec v_1 & & \vec v_n \cdot \vec v_n\end{bmatrix}. \end{equation}\] Now \(A\) is orthogonal, by \(\ref{oob}\), if and only if \(A\) has orthonormal columns, meaning \(A\) is orthogonal if and only if \(A^TA=I_n\) by \(\ref{ata}\). Therefore, \(A\) is orthogonal if and only if \(A^{-1}=A^T\).

**Theorem 4.8 (Orthogonal Projection Matrix) **

- Let \(V\) be a subspace of \(\mathbb{R}^n\) with orthonormal basis \(\vec u_1\), , \(\vec u_m\). The matrix of the orthogonal projection onto \(V\) is \(Q Q^T\) where \(Q= \begin{bmatrix} \vec u_1 & \cdots & \vec u_m \end{bmatrix}.\)
- Let \(V\) be a subspace of \(\mathbb{R}^n\) with basis \(\vec v_1,\ldots,\vec v_m\) and let \(A=\begin{bmatrix}\vec v_1 & \cdots \vec v_m \end{bmatrix}\), then the orthogonal projection matrix onto \(V\) is \(A(A^T A)^{-1}A^T\).

*Proof*. The proof of each part follows.

Since \(\vec u_1\), , \(\vec u_m\) is an orthonormal basis of \(V\) we can, by \(\ref{Orthogonal Projection}\), write, \[\begin{align*} \text{proj}_V (\vec x) & =(\vec u_1 \cdot \vec x) \vec u_1 + \cdots + (\vec u_m \cdot \vec x) \vec u_m =\vec u_1 \vec u_1^T \vec x + \cdots +\vec u_m \vec u_m^T \vec x & \\ &=(\vec u_1 \vec u_1^T + \cdots +\vec u_m \vec u_m^T) \vec x = \begin{bmatrix} \vec u_1 & \cdots & \vec u_m \end{bmatrix} \begin{bmatrix} \vec u_1^T \\ \vdots \\ \vec u_m^T \end{bmatrix} \vec x =QQ^T\vec x. \end{align*}\]

Since \(\vec v_1,\ldots,\vec v_m\) form a basis of \(V\), there exists unique scalars \(c_1,\ldots,c_m\) such that \(\text{proj}_V(\vec x)=c_1 \vec v_1+\cdots +c_m \vec v_m\). Since \(A=\begin{bmatrix}\vec v_1 & \cdots & \vec v_m \end{bmatrix}\) we can write \(\text{proj}_V(\vec x)=A \vec c\). Consider the system \(A^TA\vec c =A^T \vec x\) where \(A^TA\) is the coefficient matrix and \(\vec c\) is the unknown. Since \(\vec c\) is the coordinate vector of \(\text{proj}_V(\vec x)\) with respect to the basis \((v_1,\ldots,v_m)\), the system has a unique solution. Thus, \(A^TA\) must be invertible, and so we can solve for \(\vec c\), namely \(\vec c=(A^T A)^{-1}A^T\vec x\). Therefore, \(\text{proj}_V(\vec x)=A \vec c =A (A^T A)^{-1}A^T\) as desired. Notice it suffices to consider the system \(A^TA\vec c =A^T \vec x\), or equivalently \(A^T(\vec x-A \vec c)=\vec 0\), because \[ A^T(\vec x -A \vec c)=A^T(\vec x-c_1 \vec v_1-\cdots - c_m \vec v_m) \] is the vector whose \(i\)th component is \[ (\vec v_i)^T(\vec x-c_1 \vec v_1-\cdots -c_m \vec v_m)=\vec v_i\cdot(\vec x-c_1\vec v_1-\cdots -c_m \vec v_m) \] which we know to be zero since \(\vec x-\text{proj}_V(\vec x)\) is orthogonal to \(V\).

**Example 4.11 **Is there an orthogonal transformation \(T\) from \(\mathbb{R}^3\) to \(\mathbb{R}^3\) such that \[
T\vectorthree{2}{3}{0}=\vectorthree{3}{0}{2} \hspace{1cm} \text{and} \hspace{1cm} T\vectorthree{-3}{2}{0}=\vectorthree{2}{-3}{0}?
\] No, since the vectors \(\vectorthree{2}{3}{0}\) and \(\vectorthree{-3}{2}{0}\) are orthogonal, whereas \(\vectorthree{3}{0}{2}\) and \(\vectorthree{2}{-3}{0}\) are not, by \(\ref{orthogonal transformation}\).

**Example 4.12 **Find an orthogonal transformation \(T\) from \(\mathbb{R}^3\) to \(\mathbb{R}^3\) such that \[
T\vectorthree{2/3}{2/3}{1/3}=\vectorthree{0}{0}{1}.
\] Let’s think about the inverse of \(T\) first. The inverse of \(T\), if it exists, must satisfy \(T^{-1}(\vec e_3)=\vectorthree{2/3}{2/3}{1/3}=\vec v_3\). Furthermore, the vectors \(\vec v_1, \vec v_2, \vec v_3\) must form an orthonormal basis of \(\mathbb{R}^3\) where \(T^{-1}\vec x=\begin{bmatrix}\vec v_1 & \vec v_2 & \vec v_3\end{bmatrix} \vec x\). We require a vector \(\vec v_1\) with \(\vec v_1\cdot \vec v_3=0\) and \(\norm{\vec v_1}=1\). By inspection, we find \(\vec v_1=\vectorthree{-2/3}{1/3}{2/3}\). Then \[
\vec v_2=\vec v_1\times \vec v_3=\vectorthree{-2/3}{1/3}{2/3} \times \vectorthree{2/3}{2/3}{1/3}=\vectorthree{1/9-4/9}{-(-2/9-4/9)}{-4/9-2/9}=\vectorthree{-1/3}{2/3}{-2/3}
\] does the job since \(\norm{v_1}=\norm{v_2}=\norm{v_3}=1\) and \(\vec v_1\cdot \vec v_2=\vec v_1\cdot \vec v_3=\vec v_2\cdot \vec v_3=0\). In summary \[
T^{-1}=\begin{bmatrix}-2/3 & -1/3 & 2/3 \\ 1/3 & 2/3 & 2/3 \\\ 2/3 & -2/3 & 1/3\end{bmatrix}\vec x.
\] By \(\ref{Orthogonal and Transpose Properties}\) the matrix of \(T^{-1}\) is orthogonal and the matrix \(T=(T^{-1})^{-1}\) is the transpose of the matrix of \(T^{-1}\). Therefore, it suffices to use \[
T=\begin{bmatrix}-2/3 & -1/3 & 2/3 \\ 1/3 & 2/3 & 2/3 \\\ 2/3 & -2/3 & 1/3\end{bmatrix}^T\vec x=\begin{bmatrix}-2/3 & 1/3 & 2/3 \\ -1/3 & 2/3 & -2/3 \\\ 2/3 & 2/3 & 1/3 \end{bmatrix} \vec x.
\]

**Example 4.13 **Show that a matrix with orthogonal columns need not be an orthogonal matrix. For example \(A=\begin{bmatrix}4 & -3 \\ 3 & 4 \end{bmatrix}\) is not an orthogonal matrix \(T\vec x=A\vec x\) does not preserve length by comparing the lengths of \(\vec x\) and \(T\vec x\) with \(\vectortwo{-3}{4}\).

**Example 4.14 **Find all orthogonal \(2\times 2\) matrices. Write \(A=\begin{bmatrix}\vec v_1 & \vec v_2\end{bmatrix}\). The unit vector \(\vec v_1\) can be expressed as \(\vec v_1=\vectortwo{\cos \theta}{\sin \theta}\), for some \(\theta\). Then \(v_2\) will be one of the two unit vectors orthogonal to \(\vec v_1\), namely \(\vec v_2=\vectortwo{-\sin \theta}{\cos \theta}\) or \(\vec v_2=\vectortwo{\sin \theta}{-\cos \theta}\). Therefore, an orthogonal \(2\times 2\) matrix is either of the form \[
A=\begin{bmatrix}\cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}\hspace{1cm} \text{or} \hspace{1cm} A=\begin{bmatrix}\cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{bmatrix}
\] representing a rotation or a reflection, respectively.

**Example 4.15 **Given \(n\times n\) matrices \(A\) and \(B\) which of the following must be symmetric?

- \(B B^T\)
- \(A^T B^TB A\)
- \(B(A+A^T)B^T\)

The solution to each part follows.

- \(\ref{Orthogonal and Transpose Properties}\), \(B B^T\) is symmetric because \[ (B B^T)^T=(B^T)^TB^T=B B^T. \]
- \(\ref{Orthogonal and Transpose Properties}\), \(A^T B^TB A\) is symmetric because \[ (A^TB^TBA)^T=A^TB^T(B^T)^T(A^T)^T=A^TB^TBA. \]
- \(\ref{Orthogonal and Transpose Properties}\), \(B(A+A^T)B^T\) is symmetric because \[ (B(A+A^T)B^T)^T=((A+A^T)B^T)^TB^T=B(A+A^T)^TB^T \] \[ =B(A^T+A)^TB^T=B((A^T)^T+A^T)B^T=B(A+A^T)B^T. \]

**Example 4.16 **If the \(n\times n\) matrices \(A\) and \(B\) are symmetric which of the following must be symmetric as well?

- \(2I_n+3A-4 A^2\),
- \(A B^2 A\).

The solution to each part follows.

- First note that \((A^2)^T=(A^T)^2=A^2\) for a symmetric matrix \(A\). Now we can use the linearity of the transpose, \[ (2I_n+3A-4 A^2)^T=2I_n^T+3A^T-4 (A^2)^T=2I_n+3A-4 A^2 \] showing that the matrix \(2I_n+3A-4 A^2\) is symmetric.
- The matrix \(A B^2 A\) is symmetric since, \[ (AB^2A)^T=(ABBA)^T=(BA)^T(AB)^T=A^TB^TB^TA^T=AB^2A. \]

**Example 4.17 **Use \(\ref{Orthogonal Projection Matrix}\) to find the matrix \(A\) of the orthogonal projection onto \[
W=\text{span} \left(\vectorfour{1}{1}{1}{1},\vectorfour{1}{9}{-5}{3}\right).
\] Then find the matrix of the orthogonal projection onto the subspace of \(\mathbb{R}^4\) spanned by the vectors \(\vectorfour{1}{1}{1}{1}\) and \(\vectorfour{1}{2}{3}{4}\). First we apply \(\ref{Gram-Schmidt Process}\), the Gram-Schmidt process, to \(W=\text{span}(\vec v_1, \vec v_2)\), to find that the vectors \[
\vec u_1=\frac{\vec v_1}{\norm{\vec v_1}}
=\vectorfour{1/2}{1/2}{1/2}{1/2}, \hspace{1cm}\vec u_2
=\frac{\vec v_2^\perp}{\norm{\vec v_2^\perp}}
=\frac{\vec v_2-\left(\vec u_1 \cdot \vec v_2\right) \vec u_1}{\norm{\vec v_2-\left(\vec u_1 \cdot \vec v_2\right)\vec u_1}}
=\vectorfour{-1/10}{7/10}{-7/10}{1/10}
\] form an orthonormal basis of \(W\). By \(\ref{Orthogonal Projection Matrix}\), the matrix of the projection onto \(W\) is \(A=Q Q^T\) where \(Q=\begin{bmatrix}\vec u_1 & \vec u_2\end{bmatrix}\). Therefore the orthogonal projection onto \(W\) is \[
A=
\begin{bmatrix} 1/2 & -1/10 \\ 1/2 & 7/10 \\ 1/2 & -7/10 \\ 1/2 & 1/10 \end{bmatrix}
\begin{bmatrix} 1/2 & 1/2 & 1/2 & 1/2 \\ -1/10 & 7/10 & -7/10 & 1/10 \end{bmatrix}
=\frac{1}{100}
\begin{bmatrix}
26 & 18 & 32 & 24 \\
18 & 74 & -24 & 32 \\
32 & -24 & 74 & 18 \\
24 & 32 & 18 & 26
\end{bmatrix}.
\] Let \(A=\begin{bmatrix}1 & 1 \\ 1 & 2 \\1 & 3 \\1 & 4 \end{bmatrix}\) and then the orthogonal projection matrix is \[
A(A^TA)^{-1}A^T
=\frac{1}{10}\begin{bmatrix}7 & 4 & 1 & -2 \\4 & 3 & 2 & 1 \\ 1 & 2 & 3 & 4 \\ -2 & 1 & 4 & 7 \end{bmatrix}.
\]

## 4.4 Inner Products

Recall that the norm of \(x\in \mathbb{R}^n\) defined by \(\norm{x}=\sqrt{x_1^2+x_2^2}\) is not linear. To injective linearity into the discussion we introduce the dot product: for \(x,y\in \mathbb{R}^n\) the **dot product** of \(x\) and \(y\) is defined as \(x \cdot y=x_1 y_1+\cdots +x_n y_n\). Obviously \(x \cdot x=\norm{x}^2\), and with the dot product being so useful, so we generalize the dot product into an inner product on a vector space \(V\).

An **inner product** on \(V\) is a function that takes each ordered pair \((u,v)\) of elements of \(V\) to a number \(\ip{u,v} \in\mathbb{F}\) and has the following properties:

- ()
**positivity**\(\ip{ v,v }\geq 0\) for all \(v\in V\); - ()
**definiteness**\(\ip{ v,v }=0\) if and only if \(v=0\); - (
**additivity**in the first slot) \(\ip{ u+v,w } = \ip{u,w} +\ip{ v,w }\) for all \(u,v,w\in V\); - (
**homogeneity**in the first slot) \(\ip{av,w} =a\ip{v,w}\) for all \(a\in \mathbb{F}\) and all \(v,w,\in V\); - (
**conjugate symmetry**) \(\ip{v,w} = \overline{\ip{w,v} }\) for all \(v,w\in V\).

Recall that for \(z\in \mathbb{C}^n\), we define the norm of \(z\) by \[ \norm{z}=\sqrt{|z_1|^2+\cdots + |z_n|^2} \] where the absolute values are needed because we want \(\norm{z}\) to be a non-negative number. Then \[\begin{equation}\label{cp} \norm{z}^2=z_1 \overline{z_1}+\cdots + z_n \overline{z_n} \end{equation}\] because every \(\lambda\in\mathbb{C}\) satisfies \(|\lambda|^2 =\lambda \overline{\lambda}\). Since \(\norm{z^2}\) is the inner-product of \(z\) with itself, as in \(\mathbb{R}^n\), the Equation \(\eqref{cp}\) suggests that the inner product of \(w\in \mathbb{C}^n\) with \(z\) should equal \[ w_1 \overline{z_1}+\cdots+w_n \overline{z_n}. \] We should expect that the inner product of \(w\) with \(z\) equals the complex conjugate of the inner product of \(z\) with \(w\), thus motivating the definition of conjugate symmetry.

An **inner-product space** is a vector space \(V\) along with an inner product on \(V\). The inner product defined on \(\mathbb{F}^n\) by \[
\ip{ (w_1,\ldots,w_n),(z_1,\ldots,z_n) } = w_1 \overline{z_1}+\cdots + w_n \overline{z_n}
\] is called the **Euclidean inner product**.

Continue to let \(V\) denote a complex or real vector space. In this section we develop the basic theorems for norms. For \(v\in V\), the **norm** of \(v\) is defined by \(||v||=\sqrt{\ip{v,v} }\). Two vectors \(u,v\in V\) are **orthogonal** if \(\ip{u,v}= 0\).

**Theorem 4.9 **[Pythagorean Theorem] If \(u\) and \(v\) are orthogonal vectors in \(V\), then \[
\norm{u + v}^2 = \norm{u}^2 + \norm{y}^2.
\]

*Proof*. Suppose that \(u,v\) are orthogonal vectors in \(V\). Then \[
\norm{u+v}^2=\ip{u+v,u+v}=\norm{u}^2+\norm{v}^2+\ip{u,v}+\ip{v,u}=\norm{u}^2+\norm{v}^2,
\] as desired.

::: {#thm- } [Orthogonal Decomposition] If \(v\) is a nonzero vector in \(V\), then \(u\) can be written as a scalar multiple of \(v\) plus a vector orthogonal to \(v\). :::

*Proof*. Let \(a\in \mathbb{F}\). Then \[
u=a v+(u-av).
\] Thus we need to choose \(a\) so that \(v\) is orthogonal to \((u-a v)\). In other words, we want \[
0=\ip{u-av,v}=\ip{u,v}-a\ip{v,v}=\ip{u,v}-a\norm{v}^2.
\] The equation above shows that we should choose \(a\) to be \(\ip{u,v}/\norm{v}^2\) (assume that \(v\ne 0\) to avoid division by 0). Making this choice of \(a\), we can write \[
u=\frac{\ip{u,v}}{\norm{v}^2} v+\left(u-\frac{\ip{u,v}}{\norm{v}^2}v\right).
\] Thus, if \(v\neq 0\) then the equation above writes \(u\) as a scalar multiple of \(v\) plus a vector orthogonal to \(v\).

::: {#thm- } [Cauchy-Schwarz] If \(u, v \in V\), then \[ | \ip{u,v} | \leq \norm{u} \, \norm{v} \] where equality holds if and only if one of \(u, v\) is a scalar multiple of the other. :::

*Proof*. Let \(u,v\in V\). If \(v=0\), then both sides and the desired inequality holds. Thus we can assume that \(v\neq 0\). Consider the orthogonal decomposition \[
u=\frac{\ip{u,v}}{\norm{v}^2}v+w
\] where \(w\) is orthogonal to \(v\). By the Pythagorean theorem, \[\begin{align*}
\norm{u}^2
=\norm{\frac{\ip{u,v}}{\norm{v}^2} v}^2+\norm{w}^2\\
=\frac{|\ip{u,v}|^2}{\norm{v}^2}+\norm{w}^2\\
\geq \frac{|\ip{u,v}|^2}{\norm{v}^2}
\end{align*}\] Multiplying both sides by \(\norm{v}^2\) and then taking square roots gives the Cauchy-Schwarz inequality. Notice that there is equality if and only if \(w=0\), that is, if and only if \(u\) is a multiple of \(v\).

::: {#thm- } [Triangular Inequality] If \(u, v \in V\), then \[ \norm{u+v}\leq \norm{u}+\norm{v} \] where equality holds if and only if one of \(u, v\) is a nonnegative multiple of the other. :::

*Proof*. Let \(u,v\in V\). Then \[\begin{align}
\norm{u+v}^2 \notag
&=\ip{u+v,u+v} \notag\\
&=\ip{u,u}+\ip{v,v}+\ip{u,v}+\ip{v,u} \notag \\
&=\ip{u,u}+\ip{v,v}+\ip{u,v}+\overline{\ip{u,v}} \notag \\
&=\norm{u,u}^2+\norm{v,v}^2+2 \text{remark}\ip{u,v} \notag \\
&\leq \norm{u,u}^2+\norm{v,v}^2+2 | \ip{u,v} | \label{ti1} \\
&\leq \norm{u,u}^2+\norm{v,v}^2+2 \norm{u}\norm{v} \label{ti2}\\
&= \left(\norm{u}+\norm{v}\right)^2 \notag
\end{align}\] and so by taking square root of both sides yields the triangular inequality. This proof shows that the triangle inequality is an equality if and only if we have equality in \(\eqref{ti1}\) and \(\eqref{ti2}\). Thus we have equality in the triangular inequality if and only if \[\begin{equation}\label{ti3}
\ip{u,v}=\norm{u}\norm{v}.
\end{equation}\] If one of \(u,v\) is a nonnegative multiple of the other, then \(\eqref{ti3}\) holds. Conversely, suppose \(\eqref{ti3}\) holds. The the condition for equality in the Cauchy-Schwarz inequality implies that one of \(u,v\) must be a scalar multiple of the other. Clearly, then \(\eqref{ti3}\) forces the scalar in question to be nonnegative, as desired.

::: {#thm- } [Parallelogram Equality] If \(u, v \in V\), then \[ \norm{u+v}^2+\norm{u-v}^2= 2 \left( \norm{u}^2+\norm{v}^2 \right). \] :::

*Proof*. If \(u, v \in V\), then

\[\begin{align*}
\norm{u+v}^2+\norm{u-v}^2
&= \ip{u+v,u+v}+\ip{u-v,u-v} \\
& = \norm{u}^2+\norm{v}^2+\ip{u,v}+\ip{v,u}+\norm{u}^2+\norm{v}^2-\ip{u,v}-\ip{v,u} \\
& =2 \left( \norm{u}^2+\norm{v}^2 \right )
\end{align*}\] as desired.

A list of vectors is called **orthonormal** if the vectors in it are pairwise orthogonal and each vector has norm 1.

**Theorem 4.10 **If \((e_1,\ldots,e_m)\) is an orthonormal list of vectors in \(V\), then \[
\norm{a_1 e_1+\cdots +a_m e_m}^2=|a_1|^2+\cdots + |a_n|^2
\] for all \(a_1,\ldots,a_m\in\mathbb{F}\).

*Proof*. Because each \(e_j\) has norm 1, this follows easily from repeated by application of the Pythagorean theorem.

::: {#thm- } Every orthonormal list of vectors is linearly independent. :::

*Proof*. Suppose \((e_1,\ldots,e_n)\) is an orthonormal list of vectors in \(V\) and \(a_1,\ldots,a_n\in \mathbb{F}\) are such that \(a_1 e_1+\cdots + a_n e_n=0\). Then \(|a_1|^2+\cdots + |a_n|^2=0\), which means that all the \(a_j\)’s are 0, as desired.

An **orthonormal basis** of \(V\) is an orthonormal list of vectors in \(V\) that is also a basis of \(V\).

The importance of orthonormal bases stems mainly from the following proposition.

::: {#thm- } Suppose \((e_1,\ldots,e_n)\) is an orthonormal basis of \(V\). Then \[\begin{equation} \label{inim1} v=\ip{ v,e_1 } e_1+\cdots + \ip{ v,e_n } e_n \end{equation}\] and \[\begin{equation}\label{inim2} \norm{v}=|\ip{ v,e_1 } |^2+\cdots + |\ip{ v,e_n } |^2 \end{equation}\] for every \(v\in V\). :::

*Proof*. Let \(v\in V\). Because \((e_1,\ldots,e_n)\) is a basis of \(V\), there exist scalars \(a_1,\ldots,a_n\) such that \(v=a_1 e_1+\cdots + a_n e_n\). Take the inner product of both sides of this equation with \(e_j\), getting \(\ip{v,e_j}=a_j\). Thus \(\eqref{inim1}\) holds. Clearly \(\eqref{inim2}\) holds by \(\eqref{inim1}\) and \(\eqref{innorm}\).

::: {#thm- } [Gram-Schmidt] If \((v_1,\ldots,v_m)\) is a linearly independent list of vectors in \(V\), then there exists an orthonormal list \((e_1,\ldots,e_m)\) of vectors in \(V\) such that \[\begin{equation} \label{gmeq} \text{span}(v_1,\ldots,v_j)=\text{span}(e_1,\ldots,e_j) \end{equation}\] for \(j=1,\ldots,m\) :::

*Proof*. Suppose \((v_1,\ldots,v_m)\) is a linearly independent list of vectors in \(V\). To construct the \(e\)’s, start by setting \(e_1=\frac{v_1}{\norm{v_1}}\). This satisfies \(\eqref{gmeq}\) for \(j=1\). We will choose \(e_2,\ldots e_m\) inductively, as follows. Suppose \(j>1\) and an orthonormal list \((e_1,\ldots,e_{j-1})\) has been chosen so that \[\begin{equation}\label{gspan}
\text{span}(v_1,\ldots,v_{j-1})=\text{span}(e_1,\ldots,e_{j-1}).
\end{equation}\] Let \[\begin{equation}\label{gsproj}
e_j=\frac{v_j-\ip{v_j,e_1}e_1-\cdots - \ip{v_j,e_{j-1}} e_{j-1} }{\norm{v_j-\ip{v_j,e_1}e_1-\cdots - \ip{v_j,e_{j-1}} e_{j-1}}}.
\end{equation}\] Note that \(v_j \not\in \text{span}(v_1,\ldots,v_{j-1})\) (because \((v_1,\ldots,v_m)\) is linearly independent) and thus \(v_j\not \in \text{span}(e_1,\ldots,e_{j-1})\). Hence we are not dividing by 0 in the equation above, and so \(e_j\) is well-defined. Dividing a vector by its norm produces a new vector with norm 1; thus \(\norm{e_j}=1\).

Let \(1\leq k <j\). Then \[\begin{align*} \ip{e_j,e_k} &=\ip{\frac{v_j-\ip{v_j,e_1}e_1-\cdots - \ip{v_j,e_{j-1}} e_{j-1} }{\norm{v_j-\ip{v_j,e_1}e_1-\cdots - \ip{v_j,e_{j-1}} e_{j-1}}},e_k} \\ &= \frac{\ip{v_j,e_k}-\ip{v_j,e_k}\ip{e_k,e_k}}{\norm{v_j-\ip{v_j,e_1}e_1-\cdots - \ip{v_j,e_{j-1}} e_{j-1}}} \\ &=0. \end{align*}\] Thus \((e_1,\ldots,e_j)\) is an orthonormal list.

From \(\eqref{gsproj}\), we see that \(v_j\in \text{span}(e_1,\ldots,e_j)\). Combining this information with \(\eqref{gspan}\) shows that \[ \text{span}(v_1,\ldots,v_{j-1})\subset \text{span}(e_1,\ldots,e_{j}). \] Both lists above are linearly independent (the \(v\)’s by hypothesis, the \(e\)’s by orthonormality and \(\eqref{oind}\). Thus both subspaces above have dimension \(j\), and hence must be equal, completing the proof.

**Theorem 4.11 **Every finite-dimensional inner-product space has an orthonormal basis.

*Proof*. Choose a basis of \(V\). Apply the Gram-Schmidt procedure to it, producing an orthonormal list. This list is linearly independent and it spans \(V\). Thus it is an orthonormal basis.

**Theorem 4.12 **Every orthonormal list of vectors in \(V\) can be extended to an orthonormal basis of \(V\).

*Proof*. Suppose \((e_1, \ldots ,e_m)\) is an orthonormal list of vectors in \(V\). Then \((e_1, \ldots , e_m)\) is linearly independent and so can be extended to a basis \[
\mathcal{B}=(e_1, \ldots, e_m, v_1, \ldots, v_n)
\] of \(V\). Now apply the Gram-Schmidt procedure to \(\mathcal{B}\) producing an orthonormal list \((e_1,\ldots,e_m,f_1,\ldots,f_n)\); here the Gram-Schmidt procedure leaves the first \(m\) vectors unchanged because they are already orthonormal. Clearly \(\mathcal{B}\) is an orthonormal basis of \(V\) because it is linearly independent and its span equals \(V\). Hence we have our extension of \((e_1,\ldots,e_m)\) to an orthonormal basis of \(V\).

Recall that if \(V\) is a complex vector space, then for each operator on \(V\) there is a basis with respect to which the matrix of the operator is upper-triangular. Now for inner-product spaces we would like to know the same question.

::: {#thm- } Suppose \(T\in\mathcal{L}(V)\). If \(T\) has an upper-triangular matrix with respect to some basis of \(V\), then \(T\) has an upper-triangular matrix with respect to some orthonormal basis of \(V\). :::

*Proof*. Suppose \(T\) has upper-triangular matrix with respect to some basis \((v_1,\ldots,v_n)\) of \(V\). Thus \(\text{span}(v_1,\ldots,v_j)\) is invariant under \(T\) for each \(j=1,\ldots,n\). Apply the Gram-Schmidt procedure to \((v_1,\ldots,v_n)\), producing an orthonormal basis \((e_1,\ldots,e_n)\) of \(V\). Because \[
\text{span}(e_1,\ldots,e_j)=\text{span}(v_1,\ldots,v_j)
\] for each \(j\), we conclude that \(\text{span}(e_1,\ldots,e_j)\) is invariant under \(T\) for each \(j=1,\ldots,n\). Thus, by \(\eqref{utm}\), \(T\) has an upper-triangular matrix with respect to the orthonormal basis \((e_1,\ldots,e_n)\).

**Theorem 4.13 **Suppose \(V\) is a complex vector space and \(T\in \mathcal{L}(V)\). Then \(T\) has an upper-triangular matrix with respect to some orthonormal basis of \(V\).

*Proof*. This follows immediately from \(\eqref{cutm}\) and \(\eqref{outm}\).

If \(U\) is a subset of an inner-product space \(V\), then the **orthogonal complement** of \(U\) is defined as \(U^\bot=\{v\in V\, : \, \ip{v,u} =0 \text{ for all } u\in U\}.\)

::: {#thm- } [Orthogonal Decomposition] If \(U\) is a subspace of an inner-product space \(V\), then \(v=U\oplus U^\perp\). :::

*Proof*. Suppose that \(U\) is a subspace of \(V\). First we will show that \[\begin{equation}\label{sumfirst}
V=U + U^\perp.
\end{equation}\] To do this, suppose \(v\in V\). Let \((e_1,\ldots,e_m)\) be an orthonormal basis of \(U\). Obviously, \[
v=\underbrace{ \ip{v,e_1}e_1+\cdots +\ip{v,e_m}e_m}_u+\underbrace{v-\ip{v,e_1}e_1-\cdots -\ip{v,e_m}e_m}_w.
\] Clearly, \(u\in U\). Because \((e_1,\ldots,e_m)\) is an orthonormal list, for each \(j\) we have \[
\ip{w,e_j}=\ip{v,e_j}-\ip{v,e_j}=0.
\] Thus \(w\) is orthogonal to every vector in \(\text{span}(e_1,\ldots,e_m)\). In other words, \(w\in U^\perp\), completing the proof of \(\eqref{sumfirst}\).

If \(v\in U\cap U^\perp\), then \(v\) (which is in \(U\)) is orthogonal to every vector in \(U\) (including \(v\) itself), which implies that \(\ip{v,v}=0\), which implies that \(v=0\). Thus \[\begin{equation}\label{orthogonalss} U\cap U^\perp=\{0\}. \end{equation}\] Now \(\eqref{sumfirst}\) and \(\eqref{orthogonalss}\) imply that \(U\oplus U^\perp\).

**Theorem 4.14 **If \(U\) is a subspace of an inner-product space \(V\), then \(U=(U^\perp)^\perp\).

*Proof*. Suppose that \(U\) is a subspace of \(V\). First we will show that \[\begin{equation}\label{subsetorth}
U\subseteq (U^\perp)^\perp.
\end{equation}\] To do this, suppose that \(u\in U\). Then \(\ip{u,v}=0\) for every \(v\in U^\perp\) (by definition of \(U^\perp\)). Because \(u\) is orthogonal to every vector in \(U^\perp\), we have \(u\in (U^\perp)^\perp\), completing the proof of \(\eqref{subsetorth}\).

To prove the inclusion in the other direction, suppose \(v\in (U^\perp)^\perp\). By \(\eqref{ip-orthogonal-decomposition}\), we can write \(v=u+w\), where \(u\in U\) and \(w\in U^\perp\). We have \(v-u=w\in U^\perp\). Because \(v\in (U^\perp)^\perp\) and \(u\in(U^\perp)^\perp\) (from \(\eqref{subsetorth}\)), we have \(v-u\in (U^\perp)^\perp\). Thus \(v-u\in U^\perp\cap (U^\perp)^\perp\), which implies that \(v=u\), which implies that \(v\in U\). Thus \((U^\perp)^\perp \subseteq U\), which along with \(\eqref{subsetorth}\) completes the proof.

Let \(V=U \oplus U ^\bot\) and for \(v\in V\) let \(v=u+w\) where \(w\in U ^\bot\). Then \(u\) is called the **orthogonal projection** of \(V\) onto \(U\) and is denoted by \(P_U v\).

**Theorem 4.15 **If \(U\) is a subspace of an inner-product space \(V\) and \(v\in V\). Then \(\norm{v-P_U v}\leq \norm{v-u}\) for every \(u\in U\). Furthermore, if \(u\in U\) and the inequality above is an equality; then \(u=P_U v\).

*Proof*. Suppose \(u\in U\). Then \[\begin{align}
\norm{v-P_U v}^2
& \leq \norm{v-P_Uv}+ \norm{P_U v-u}^2 \label{mp1} \\
& = \norm{v-P_U v+P_Uv-u}^2 = \norm{v-u}^2 \label{mp2}
\end{align}\] where \(\eqref{mp2}\) comes from the Pythagorean theorem, which applies because \(v-P_U v\in U^\perp\) and \(P_U v-u\in U\). Taking the square root gives the desired inequality. The inequality is an equality if and only if \(\eqref{mp1}\) is an equality, which happens if and only if \(\norm{P_U v-u}=0\), which happens if and only if \(u=P_u v\).

**Example 4.18 **Show that if \(c_1,\ldots,c_n\) are positive numbers, then \[
\ip{ (w_1,\ldots,w_n),(z_1,\ldots,z_n) } = c_1 w_1 \overline{z_1}+\cdots + c_n w_n \overline{z_n}
\] defines an inner product on \(\mathbb{F}^n\).

**Example 4.19 **Show that if \(p,q\in \mathcal{P}_m(\mathbb{F})\), then \[
\ip{ p ,q } = \int_0^1 p(x)\overline{q(x)}dx
\] is an inner product on the vector space \(\mathcal{P}_m(\mathbb{F})\).

**Example 4.20 **Show that every inner product is a linear map in the first slot, as well as a linear map in the second slot.

**Example 4.21 **If \(v\in V\) and \(a\in \mathbb{F}\), then \(\norm{av}=|a| \norm{v}\), and if \(v\) is nonzero then \(u=\frac{1}{\norm{v}} v\) is a unit vector. Since \(\norm{a v}^2=\ip{av,av} =a \overline{a} \ip{v,v} =|a|^2 \norm{v}^2\), taking square roots provides \(\norm{a v}=|a| \norm{v}\).

**Example 4.22 **Prove that if \(x, y\) are nonzero vectors in \(\mathbb{R}^2\), then \[
\ip{x,y} =\norm{x}\norm{y} \cos \theta,
\] where \(\theta\) is the angle between \(x\) and \(y\). The law of cosines gives \[
\norm{x-y}^2=\norm{x}^2+\norm{y}^2-2\norm{x}\norm{y} \cos \theta.
\] The left hand side of this equation is \[
\norm{x-y}^2=(x-y)\cdot (x-y)=\norm{x}^2-2 (x \cdot y) +\norm{y}^2
\] so \[
x\cdot y= \norm{x}\norm{y}\cos \theta.
\]

**Example 4.23 **Suppose \(u,v \in V\). Prove that \(\ip{ u,v } =0\) if and only if \(||u||\leq ||u+a v||\) for all \(a\in F\). If \(\ip{u,v}=0\), then by the Pythagorean theorem \[
\norm{u+\alpha v}^2=\norm{u}^2+\norm{\alpha v}^2\geq \norm{u}.
\] Conversely, we will prove the contrapositive, that is we will prove: if \(\ip{u,v}\neq0\) then there exists \(a \in \mathbb{F}\) such that \(\norm{u}>\norm{u+av}\). Suppose \(\ip{u,v}\neq 0\) then \(u\) and \(v\) are both nonzero vectors. By the orthogonal decomposition, we can write \[\begin{equation} \label{ex3}
u=\alpha v+w
\end{equation}\] for some \(\alpha \in \mathbb{F}\) and where \(\ip{w,v}=0\). Notice \(\alpha \neq 0\) since \(\ip{u,v}\neq 0\). Since \(v\) and \(w\) are orthogonal \[
\norm{u}^2=|\alpha|^2\norm{v}^2+\norm{w}^2
\] Let \(a=-\alpha\). Then by equation \(\eqref{ex3}\) \[
\norm{u+a v}^2=\norm{w}^2
\] and so \[
\norm{u}^2=|a|^2\norm{v}^2 +\norm{u+a v}^2 > \norm{u+a v}^2
\] which implies \[
\norm{u}>\norm{u+a v}
\] as desired.

**Example 4.24 **Prove that \[
\left (\sum_{k=1}^{n} a_k b_k \right )^2
\leq \left ( \sum_{k=1}^n k a_k^2 \right ) \left ( \sum_{k=1}^n \frac{b_k^2}{k}\right )
\] for all real numbers \(a_1,\ldots,a_n\) and \(b_1,\ldots,b_n\). This is a simple trick. \[
\left (\sum_{k=1}^{n} a_k b_k \right )^2
= \left ( \sum_{k=1}^n \sqrt{k} a_k \frac{b_k}{\sqrt{k}} \right )^2
\leq \left ( \sum_{k=1}^n k a_k^2 \right ) \left ( \sum_{k=1}^n \frac{b_k^2}{k}\right )
\] where the last inequality is from the Cauchy-Schwarz inequality.

**Example 4.25 **Suppose \(u, v\in V\) are such that \(||u||=3\), \(||u+v||=4\), and \(||u-v||=6\). What number must \(||v||\) equal? Using the parallelogram equality \[
\norm{u+v}^2+\norm{u-v}^2=2 \left(\norm{u}^2+\norm{v}^2\right)
\] to get \[
16+36=2(9+\norm{v}^2), \qquad \norm{v}=\sqrt{17}.
\]

**Example 4.26 **Prove or disprove: there is an inner product on \(\mathbb{R}^2\) such that the associated norm is given by \(||(x_1,x_2)||=|x_1|+|x_2|\) for all \((x_1,x_2)\in \mathbb{R}^2\). There is no such inner product. Take for instance, \[
u=(1/4,0), \qquad v=(0,3/4), \qquad u+v=(1/4,3/4).
\] Then we have equality in the triangular inequality \[
1=\norm{u+v}\leq \norm{u}+\norm{v}=1/4+3/4.
\] By the triangular inequality, we must have \(u=a v\) or \(v=a v\), with \(a\geq 0\). But clearly no such \(a\in \mathbb{F}\) exists.

**Example 4.27 **Prove that if \(V\) is a real inner-product space, then \[
\ip{ u,v } =\frac{||u+v||^2-||u-v||^2}{4}
\] for all \(u,v\in V\). Expressing the norms as inner products \[\begin{align*}
%\ip{ u,v }
\frac{\norm{u+v}^2- \norm{u-v}^2}{4}
&=\frac{\ip{u+v,u+v}-\ip{u-v,u-v} }{4} \\
&=\frac{\ip{u,u}+\ip{v,v}+\ip{u,v}+\ip{v,u} - \ip{u,u}-\ip{v,v}+\ip{u,v}+\ip{v,u}}{4} \\
&=\frac{2\ip{u,v}+2\ip{v,u} }{4} \\
&=\frac{2\ip{u,v}+2\ip{u,v} }{4} \quad \text{(because $V$ is a real inner product space)} \\
&=\ip{u,v}
\end{align*}\] as desired.

**Example 4.28 **Prove that if \(V\) is a complex inner-product space, then \[
\ip{ u,v }
\] is \[
\frac{||u+v||^2-||u-v||^2 + ||u+i v||^2 i - ||u-i v||^2 i}{4}
\] for all \(u,v\in V\).

**Example 4.29 **A norm on a vector space \(U\) is a function \(||\text{ }|| : U\rightarrow [0,\infty)\) such that \(||u||=0\) if and only if \(u=0\), \(||\alpha u ||= |\alpha | ||u||\) for all \(\alpha \in F\) and all \(u \in U\), and \(||u+v|| \leq ||u||+||v||\) for all \(u,v \in U\). Prove that a norm satisfying the parallelogram equality comes from an inner product (in other words, show that if \(|| \text{ }||\) is a norm on \(U\) satisfying the parallelogram equality, then there is an inner product \(\ip{ \text{ } , \text{ } }\) on \(U\) such that \(||u||=\ip{ u, u } ^{1/2}\) for all \(u \in U\)).

**Example 4.30 **Suppose \(n\) is a positive integer. Prove that \[
\left (
\frac{1}{\sqrt{2\pi}},\frac{\sin x}{\sqrt{\pi}},\frac{\sin 2x}{\sqrt{\pi}},\ldots,\frac{\sin n x}{\sqrt{\pi}}, \frac{\cos x}{\sqrt{\pi}},\frac{\cos 2x}{\sqrt{\pi}},\ldots,\frac{\cos n x}{\sqrt{\pi}}
\right )
\] is an orthonormal list of vectors in \(\mathcal{C}[-\pi,\pi]\), the vector space of continuous real-valued functions on \([-\pi,\pi]\) with inner product \[
\ip{ f,g } = \int_{-\pi}^{\pi} f(x)g(x) \, d x .
\] Computation of these integrals is based on the product-to-sum formulas from trigonometry: \[\begin{align*}
\sin(A)\sin(B)&=\frac{1}{2}\cos(A-B)-\frac{1}{2}\cos(A+B) \\
\cos(A)\cos(B)&=\frac{1}{2}\cos(A-B)+\frac{1}{2}\cos(A+B) \\
\sin(A)\cos(B)&=\frac{1}{2}\sin(A-B)+\frac{1}{2}\sin(A+B).
\end{align*}\] Here is a sample computation, valid for \(m,n=1,2,3,,\ldots\) when \(m\neq n\). \[\begin{align*}
\ip{\sin(mx),\cos(nx)}
&= \int_{-\pi}^{\pi}\sin(mx)\cos(nx) \, dx \\
&= \int_{-\pi}^{\pi}\left(\frac{1}{2}\sin((m-n)x)+\frac{1}{2}\sin((m+n)x)\right)\, dx \\
&= \left. \frac{-1}{2(m-n)}\cos\left((m-n)x\right)+\frac{-1}{2(m+n)}\cos\left((m+n)x\right) \right|^{\pi}_{-\pi}, \\
&= \frac{-1}{2(m-n)}[-1-(-1)]+\frac{-1}{2(m-n)}[-1-(-1)] \\
&=0.
\end{align*}\]

**Example 4.31 **On \(\mathcal{P}_2(\mathbb{R})\), consider the inner product given by \[
\ip{ p,q } = \int_o^1 p(x) q(x) \, d x.
\] Apply the Gram-Schmidt procedure to the basis \((1,x,x^2)\) to produce an orthonormal basis of \(\mathcal{P}_2(\mathbb{R})\). Computing \(e_1\): A calculation gives \[
\ip{1,1}=\int_0^1 1 \, dx=1,
\] so \(e_1=1\). Computing \(e_2\): A calculation gives \[
\ip{x,1}=\int_0^1 x \, dx =\frac{1}{2}.
\] Let \[
f_2=x-1/2.
\] Then \[
\norm{f_2}^2=\ip{x-1/2,x-1/2}=\int_0^1\left(x^2-x+\frac{1}{4}\right)\, dx =\frac{1}{12},
\] so \[
e_2=\frac{f_2}{\norm{f_2}}=2\sqrt{3}\left(x-\frac{1}{2}\right).
\] Computing \(e_3\): A calculation gives \[
\ip{x^2,1}=\int_0^1 x^2 \, dx =\frac{1}{3},
\] and \[
\ip{x^2,2\sqrt{3}\left(x-\frac{1}{2}\right)}=2\sqrt{3}\int_0^1\left(x^3-\frac{x^2}{2}\right) \, dx=\frac{1}{2\sqrt{3}}.
\] Let \[
f_3=x^2-\frac{1}{2\sqrt{3}}\left(x-\frac{1}{2}\right)-\frac{1}{3}=x^2-x+\frac{1}{6}.
\] Then \[
\norm{f_3}^2=\int_0^1\left(x^2-x+\frac{1}{6}\right)^2 \, dx
\] and \[
e_3=\frac{f_3}{\norm{f_3}}.
\]

**Example 4.32 **What happens if the Gram-Schmidt procedure is applied to a list of vectors that is not linearly independent? By examining the proof, notice that the numerator in the Gram-Schmidt formula is the difference between \(v_j\) and the orthogonal projection \(P_u\) of \(v_j\) onto the subspace \[
U=\text{span}(v_1,\ldots,v_{j-1})=\text{span}(e_1,\ldots,e_{j-1}).
\] If \(v_j\in U\), then \(v_j-P_U v_j=0\), so the numerator has norm 0 and division by the denominator is not defined. The algorithm can be adapted to handle this case by testing for 0 in the denominator. If 0 is found, throw \(v_j\) out of the list and continue. The result will be an orthonormal basis for \(\text{span}(v_1,\ldots,v_N)\).

**Example 4.33 **Suppose \(V\) is a real inner-product space and \((v_1,\ldots,v_m)\) is a linearly independent list of vectors in \(V\). Prove that there exist exactly \(2^m\) orthonormal lists \((e_1,\ldots,e_m)\) of vectors in \(V\) such that span \((v_1,\ldots,v_j)=\) span \((e_1,\ldots,e_j)\) for all \(j\in \{1,\ldots,m\}\).

**Example 4.34 **Suppose \((e_1,\ldots,e_M)\) is an orthonormal list of vectors in \(V\). Let \(v \in V\). Prove that \[
||v||^2=\sum_{n=1}^M \left|\ip{v,e_n}\right|^2
\] if and only if $v $ span \((e_1,\ldots,e_M)\). Extend \((e_1,\ldots,e_M)\) to an orthonormal basis for \(V\). Then \[
\sum_{n=1}^N\ip{v,e_n} e_n
\] and \[
\norm{v}^2=\sum_{n=1}^N \left|\ip{v,e_n}\right|^2.
\] If \(v\in \text{span}(e_1,\ldots,e_M)\) then \(\ip{v,e_n}=0\) for \(n>M\), and \[
\norm{v}^2=\sum_{m=1}^M \left|\ip{v,e_n}\right|^2.
\] If \(v \not\in\text{span}(e_1,\ldots,e_M)\) then for some \(n>M\) we have \(\ip{v,e_n}\neq 0\). This gives \[
\norm{v}^2=\sum_{n=1}^N\left|\ip{v,e_n}\right|^2>\sum_{m=1}^M\left|v,e_n\right|^2.
\]

**Example 4.35 **Find an orthonormal basis of \(\mathcal{P}_2(\mathbb{R})\), such that the differentiation operator on \(\mathcal{P}_2(\mathbb{R})\) has an upper-triangular matrix with respect to this basis.

**Example 4.36 **Suppose \(U\) is a subspace of \(V\). Prove that \(\text{dim} U^{\bot} = \text{dim} V - \text{dim} U\).

**Example 4.37 **Suppose \(U\) is a subspace of \(V\). Prove that \(U^{\bot} = \{0\}\) if and only if \(U=V\). If \(U=V\) and \(v\in U^\perp\) then \(v\in U\cap U^\perp\), and \(\ip{v,v}=0\), so \(v=0\). Therefore, \(V=U\oplus U^\perp\). If \(U^\perp=\{0\}\), then \(U=V\).

**Example 4.38 **Prove that if \(P\in \mathcal{L}(V)\) is such that \(P^2=P\) and every vector in null \(P\) is orthogonal to every vector in range \(P\), then \(P\) is an orthogonal projection.

**Example 4.39 **Prove that if \(P\in \mathcal{L}(V)\) is such that \(P^2=P\) and \(|| P v ||\leq ||v||\) for every \(v\in V\), then \(P\) is an orthogonal projection.

**Example 4.40 **Suppose \(T\in \mathcal{L}(V)\) and \(U\) is a subspace of \(V\). Prove that \(U\) is invariant under \(T\) if and only if \(P_U T= T P_U\).

**Example 4.41 **Suppose \(T\in \mathcal{L}(V)\) and \(U\) is a subspace of \(V\). Prove that \(U\) and \(U^\bot\) are both invariant under \(T\) if and only if \(P_U T= T P_U\).

**Example 4.42 **In \(\mathbb{R}^4\), let \(U=\text{span} \left ( (1,1,0,0),(1,1,1,2) \right )\). Find \(u\in U\) such that \(||u-(1,2,3,4)||\) is as small as possible. in \(\mathbb{R}^4\) let \(U=\text{span}((1,1,0,0),(1,1,1,2)).\) Find \(u\in U\) such that \(\norm{u-(1,2,3,4)}\) is as small as possible. We want the orthogonal projection \(P_U(1,2,3,4)\). Notice that \(U=\text{span}((1,1,0,0),(0,0,1,2)).\) An orthonormal basis for \(U\) is \[
\left( \frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}} ,0,0\right ),\left(0,0,\frac{1}{\sqrt{5}},\frac{2}{\sqrt{5}} \right)
\] Thus the desired vector is \[
P_U(1,2,3,4)=\left(\frac{3}{2},\frac{3}{2},0,0\right)+\left(0,0,\frac{11}{5},\frac{2}{5}\right).
\]

**Example 4.43 **Find a polynomial \(p\in \mathcal{P}_3(\mathbb{R})\) such that \(p(0)=0\), \(p'(0)=0\), and \[
\int_0^1 |2+3x-p(x) |^2 \, dx
\] is as small as possible.

**Example 4.44 **Find a polynomial \(p\in \mathcal{P}_5(\mathbb{R})\) that makes \[
\int_{-\pi}^{\pi} |\sin x - p(x) |^2 \, dx
\] is as small as possible.

**Example 4.45 **Find a polynomial \(p\in \mathcal{P}_2(\mathbb{R})\) such that \[
\phi(p)=p \left( \frac{1}{2} \right) = \int_{0}^{1} p(x) \, q(x) \, dx
\] for every \(p\in \mathcal{P}_2(\mathbb{R})\). Here is the direct approach. Every \(q\in\mathcal{P}_2(\mathbb{R})\) can be expressed as \[
\alpha+\beta(x-1/2)+\gamma(x-1/2)^2.
\] The desired polynomial \(q\) must satisfy \[
p(x)=1: \qquad \phi(1)=p(1/2)=1=\int_0^1\left( \alpha+\beta(x-1/2)+\gamma(x-1/2)^2 \right) \, dx = \alpha+\gamma \frac{1}{12}.
\] Moving to \(p(x)=x-1/2\) we find \[\begin{align*}
p(x)&=x-1/2: \qquad \phi(x-1/2)=p(1/2)=0 \\
&=\int_0^1(x-1/2)[\alpha+\beta(x-1/2)+\gamma(x-1/2)^2] \, dx =\beta \frac{1}{12},
\end{align*}\] so \(\beta=0\). Finally \[
p(x)=(x-1/2)^2: \qquad \phi((x-1/2)^2)=p(1/2)=0
\] \[
=\int_0^1(x-1/2)^2[\alpha+\beta(x-1/2)+\gamma(x-1/2)^2] \, dx =\alpha \frac{1}{12}+\gamma\frac{1}{80}.
\] Solving gives \[
\alpha=\frac{27}{12}, \qquad \beta=0, \qquad \gamma=-15.
\] Thus \[
q(x)=\frac{27}{12}-15(x-1/2)^2.
\]

**Example 4.46 **Find a polynomial \(q\in \mathcal{P}_2(\mathbb{R})\) such that \[
\phi(p)=\int_{0}^{1} p(x) \, (\cos \pi x) \, dx= \int_{0}^{1} p(x) \, q(x) \, dx
\] for every \(p\in \mathcal{P}_2(\mathbb{R})\). Taking the same approach as in the previous example. We compute \[
p(x)=1: \qquad \phi(1)=\int_0^1\cos(\pi x) \, dx=0
\] \[
=\int_0^1[\alpha+\beta(x-1/2)+\gamma(x-1/2)^2]\,dx=\alpha+\gamma\frac{1}{12}.
\] Moving to \(p(x)=x-1/2\) we find \[
p(x)=x-1/2: \qquad \phi(x-1/2)=\int_0^1(x-1/2)\cos(\pi x)\, dx=\int_0^1 x \cos (\pi x) \, dx
\] \[
=-\frac{2}{\pi^2}=\int_0^1(x-1/2)[\alpha+\beta(x-1/2)+\gamma(x-1/2)^2]\, dx=\beta\frac{1}{12},
\] so \(\beta=\frac{-24}{\pi^2}\). Finally, since \(\cos(\pi x)\) is odd about \(x=1/2\), \[
p(x)=(x-1/2)^2: \qquad \phi((x-1/2)^2)=\int_0^1(x-1/2)^2\cos(\pi x)\, dx =0
\] \[
=\int_0^1(x-1/2)^2[\alpha+\beta(x-1/2)+\gamma(x-1/2)^2]=\alpha \frac{1}{12} +\gamma\frac{1}{80}.
\] Solving gives \[
\alpha=\gamma=0, \qquad \beta=\frac{-24}{\pi^2}.
\] Thus \[
q(x)=\frac{-24}{\pi^2}(x-1/2).
\]

**Example 4.47 **Give an example of a real vector space \(V\) and \(T\in \mathcal{L}(V)\) such that trace\((T^2) < 0\).

**Example 4.48 **Suppose \(V\) is a real vector space, \(T\in \mathcal{L}(V)\), and \(V\) has a basis consisting of eignvectors of \(T\). Prove that trace\((T^2)\geq 0\).

**Example 4.49 **Suppose \(V\) is an inner-product space and \(v, w\in V\). Define \(T\in \mathcal{L}(V)\) by \(T u=\langle u,v \rangle w\). Find a formula for trace \(T\).

**Example 4.50 **Prove that if \(P\in \mathcal{L}(V)\) satisfies \(P^2=P\), then trace \(P\) is a nonnegative integer.

**Example 4.51 **Prove that if \(V\) is an inner-product space and \(T\in \mathcal{L}(V)\), then trace \(T^*=\overline{\text{trace} T}.\)

**Example 4.52 **Suppose \(V\) is an inner-product space and \(T\in \mathcal{L}(V)\) is a positive operator with trace \(T=0\), then \(T=0\).

**Example 4.53 **Suppose \(T\in \mathcal{L}(\mathbb{C}^3)\) is the operator whose matrix is \[
\begin{bmatrix}
51 & -12 & -21 \\
60 & -40 & -28 \\
57 & -68 & 1
\end{bmatrix} .
\] If \(-48\) and \(24\) are eigenvalues of \(T\), find the third eigenvalue of \(T\).

**Example 4.54 **Prove or give a counterexample: if \(T\in \mathcal{L}(V)\) and \(c\in F\), then trace\((c T ) = c\) trace \(T\).

**Example 4.55 **Prove or give a counterexample,: if \(S, T\in \mathcal{L}(V)\), then trace \((S T)=(\text{trace} S)(\text{trace} T)\).

**Example 4.56 **Suppose \(T\in \mathcal{L}(V)\). Prove that if \(\text{trace} (ST)=0\) for all \(S\in \mathcal{L}(V)\), then \(T=0\).

**Example 4.57 **Suppose \(V\) is an inner-product space and \(T\in \mathcal{L}(V)\). Prove that if \((e_1,\ldots.,e_n)\) is an orthonormal basis of \(V\), then \[
\text{trace}(T^* T)=|| T e_1||^2+\cdots + ||T e_n||^2.
\] Conclude that the right side of the equation above is independent of which orthonormal basis \((e_1,\ldots,e_n)\) is chosen for \(V\).

**Example 4.58 **Suppose \(V\) is a complex inner-product space and \(T\in \mathcal{L}(V)\). Let \(\lambda_1,\ldots.,\lambda_n\) be the eigenvalues of \(T\), repeated according to multiplicity.

**Example 4.59 **Suppose \[
\begin{bmatrix}
a_{1,1} & \cdots & a_{1,n} \\
\vdots & & \vdots \\
a_{n,1} & \cdots & a_{n,n} \\
\end{bmatrix}
\] is the matrix of \(T\) with respect to some orthonormal basis of \(V\). Prove that \[
|\lambda_1|^2+\cdots + |\lambda_n|^2\leq \sum^n_{k=1} \sum_{j=1}^n |a_{j,k}|^2.
\]

**Example 4.60 **Suppose \(V\) is an inner-product space. Prove that \(\langle S, T\rangle=\text{trace}(S T^*)\) defines an inner-product on \(\mathcal{L}(V)\).

**Example 4.61 **Suppose \(V\) is an inner-product space and \(T\in \mathcal{L}(V)\). Prove that if \(||T^* v ||\leq ||T v||\) for every \(v \in V\), then \(T\) is normal.

**Example 4.62 **Prove or give a counterexample: if \(T\in \mathcal{L}(V)\) and \(c\in F\), then \(\det(cT)=c^{\text{dim} V} \det T\).

**Example 4.63 **Prove or give a counterexample: if \(T\in \mathcal{L}(V)\), then \(\det(S+T)=\det S+ \det T\).

**Example 4.64 **Suppose \(A\) is a block upper-triangular matrix \[
A=
\begin{bmatrix}
A_1 & & * \\
& \ddots & \\
0 & & A_m
\end{bmatrix} ,
\] where each \(A_j\) along the diagonal is a square matrix. Prove that \[
\det A =(\det A_1) \cdots (\det A_m).
\]