Linear Algebra

Basics

Systems of Equations

overdetermined: if there are more equations than unknowns
underdetermined: if there are more equations than unknowns

Dot Products

Any line in \mathbb{R}^2 can be described by the dot product: all x such that a \cdot x = c where a, x are vectors and c \in \mathbb{R}. This is because a_1x + a_2y = c describes a line.

The dot product is zero if and only if the two vectors are orthogonal. On \mathbb{R}^2, this corresponds to two lines going through the origin that are perpendicular to each other.

We can think of a \cdot b as measuring the extent to which a and b point in the same direction. The dot product is positive if a and b point in the same general direction, 0 if they are perpendicular, and negative if they point in generally opposite directions.

Cross Products

Given a, b \in \mathbb{R}^3, their cross product, a \times b \in \mathbb{R}^3, is a vector that is

perpendicular to both a and b, thus normal to the plane containing them
has magnitude equal to the area of the parallelogram that the vectors span

So, if the vectors a and b are parallel the cross product is the zero vector.

The cross product is defined by the formula a \times b = \|a\| \|b\| \sin(\theta) n

where:

\theta \in [0,\pi) is the angle between a and b in the plane containing them
\|a\|, \|b\| are the magnitudes of vectors a and b
n is a unit vector perpendicular to the plane containing a and b, in the direction given by the right-hand rule

Determinants

Can be computed from a square matrix. Intuitively represents the scaling factor of the transformation described by the matrix.

If a matrix A has a non-zero determinant, then it is invertible.

Invertible matrix

Let A be a square n \times n matrix. Then the following are equivalent:

A is invertible.
A is non-singular.
\det(A) \neq 0.
The columns of A are linearly independent.
The columns of A span \mathbb{R}^n.
The columns of A are a basis in \mathbb{R}^n.
The rows of A are linearly independent.
The rows of A span \mathbb{R}^n.
The rows of A are a basis in \mathbb{R}^n.
The reduced row echelon form of A has a leading 1 in each row.
etc.

If any of these above conditions do not hold, then A maps the point to at most n-1 dimensions. For example, if A was a 3 \times 3 matrix, then Ax = y would be in one or two dimensions (for example, a projection from 3-d space to 2-d). It is thus not possible to find an inverse transform to go from y to x.

Change of Basis

A basis for a vector space is a linearly independent set spanning the vector space.

If A is a matrix whose columns comprise a basis of \mathbb{R}^n, then a vector v (in the standard basis) is the vector A^{-1}v in the basis of A.

Nullspace (kernel)

The kernel of a linear map L: V \rightarrow W between two vector spaces V and W is the set of all elements v \in V such that L(v) = 0. That is, \ker(L) = \{v \in V | L(v) = 0\}.

The kernel L is a linear subspace of the domain V. The dimension of the kernel of L is called the nullity.

The kernel of matrix can be computed by solving for x in A \cdot x = 0.

Rank

The rank of a matrix A is the dimension of the vector space generated (or spanned) by its columns. This is the same as the dimension of the space spanned by its rows.

Can be computed by counting the number of non-zero rows in its reduced row-echelon form.

The rank is also defined as the dimension of the image of L. The rank-nullity theorem states \dim(\ker L) + \dim(\operatorname{im} L) = \dim(V).

Matrix Definiteness

A matrix is positive definite, i.e. A > \mathbf{0}, if x^TAx>0. Semi-definite if \geq. Negative follows with the <,\leq comparisons.

A matrix is indefinite if it is not positive semi-definite AND not negative semi-definite.

A matrix is positive definite if all eigenvalues are positive.

Complex vector spaces

Scalars are complex numbers, and vectors and matrices consist of complex numbers. All the usual complex number rules and vector space rules apply.

Theorem:

If a real matrix A, has complex evalues, then those evalues and evectors occur in conjugate pairs:
- if \lambda is an evalue, then so is \bar{\lambda}
- if x is an evector, then so is \bar{x}

Matrix Types

Conjugate Transpose: If A is a complex matrix, then the conjugate transpose of A is A^* = \bar{A}^T.

The conjugate transpose has the following properties:

(A^*)^* = A
(A \pm B)^* = (A^* \pm B^*)
(kA)^* = \bar{k}A^*
(AB)^* = B^*A^*

Summary

A square complex matrix A is said to be

unitary: A^{-1} = A^*
Hermitian: A^* = A
normal: A^*A=AA^*

A square matrix A:

Gram: the Hermitian matrix of inner products X^*X
Toeplitz: each diagonal has the same element
Hankel: each anti-diagonal has the same element
circulant: all rows same elements, with each row rotated one element to the right relative to previous row. Can be diagonalized using discrete Fourier transform and so linear equations can be solved using FFT.

Orthogonal Matrix

An orthogonal matrix is a square matrix whose columns and rows are orthonormal vectors, i.e. A^TA = AA^T = I.

That is, A^T = A^{-1}.

Over the field of complex numbers, the matrix is orthogonal iff A^*A = AA^* = I, where ^* denotes the conjugate transpose.

These matrices preserve norms (i.e., has determinant 1). Orthogonal matrices don’t decrease numerical precision (i.e. they preserve round-off error).

Theorems:

If A is orthogonal, then \det(A) = 1 or \det = -1.

Unitary Matrix

A \in \mathbb{C}^{n\times n} is unitary iff A^{-1} = A^*. Unitary matrices generalize orthogonal matrices.

If A is a unitary matrix, then

\|Ax\| = \|x\| for all x \in \mathbb{C}^n
Ax\cdot Ay = x\cdot y for all x, y \in \mathbb{C}^n

Hermitian Matrix

A \in \mathbb{C}^{n\times n} is Hermitian iff A^* = A Hermitian matrices generalize real symmetric matrices.

The eigenvalues of a Hermitian are real. For Hermitian A, the eigenvectors from different eigenvalues are orthogonal. Example: if Av_1 = \lambda_1 v_1, Av_2 = \lambda_2 v_2, then v_1 \cdot v_2 = 0.

Normal Matrix

A \in \mathbb{C}^{n\times n} is Hermitian iff A^*A=AA^*.

All unitary, Hermitian, and skew-Hermitian matrices are normal.

Matrix Decompositions / Factorizations

Summary

Eigenvalue decomposition

symmetric n\times n matrix A
A=PDP^T
P is an n\times n orthogonal matrix of eigenvectors of A
D is the diagonal matrix of eigenvalues

Hessenburg decomposition

asymmetric n\times n matrix A
A=PHP^T
P is an n\times n orthogonal matrix
H is in upper Hessenburg form

Schur decomposition

asymmetric n\times n matrix A with real eigenvalues
A=PSP^T
P is an n\times n orthogonal matrix
S is upper triangular

Theorem: If A is an m\times n matrix, then:

A^TA is orthogonally diagonalizable
the eigenvalues of A^TA are \geq 0

LU Decomposition

Any matrix A \in \mathbb{C}^{n\times n} has an LU factorization PA = LU, where

P is a permutation matrix
L is a lower triangular matrix (with 1s on diagonal)
U is an upper triangular matrix.

We can take P = I if the leading principal submatrices of A are nonsingular, but to guarantee that the factorization is numerically stable we need A to have particular properties, such as diagonal dominance. In practical computation we normally choose P using the partial pivoting strategy, which almost always ensures numerical stability.

Cost: 2n^3/3 flops.
Use: solving general linear systems.

Can be used to efficiently compute solutions x in Ax=b:

LUx=b
define y such that Ux = y
solve for y in Ly = b
then solve for x in Ux=y

LU decompositions are not unique. A square matrix A can be factored into LU form if it can be reduced to reduced row echelon form without row interchanges.

LDU factorization decomposes the matrix A uniquely as A = LDU, where L, U are lower and upper triangular matrix with 1s on the main diagonal, and D is a diagonal matrix.

PLU-factorization is a related technique where the reduction to row echelon form is achieved by using the permutation matrix P^{-1} so that A = PLU.

Cholesky Decomposition

Every Hermitian positive definite matrix A \in \mathbb{C}^{n\times n} has a unique Cholesky factorization A = U^*U where U \in \mathbb{C}^{n\times n} is upper triangular with positive diagonal.

Cost: n^3/3 flops.
Use: solving positive definite linear systems

Schur Decomposition

Any matrix A\in\mathbb{C}^{n\times n} has a Schur decomposition A = QTQ^*, where Q is unitary and T is upper triangular. The eigenvalues of A appear on the diagonal of T. For each k, the leading k columns of Q span an invariant subspace of A.

For real matrices, a special form of this decomposition exists in which all the factors are real. An upper quasi-triangular matrix R is a block upper triangular with whose diagonal blocks R_{ii} are either 1\times1 or 2\times2. Any A\in\mathbb{R}^{n\times n} has a real Schur decomposition A = Q R Q^T, where Q is real orthogonal and R is real upper quasi-triangular with any 2\times 2 diagonal blocks having complex conjugate eigenvalues.

Cost: 25n^3 flops for Q and T (or R) by the QR algorithm; 10n^3 flops for T (or R) only.

Use:

computing eigenvalues and eigenvectors
computing invariant subspaces
evaluating matrix functions

QR Decomposition

Any matrix A \in \mathbb{C}^{m\times n} with m\ge n has a QR factorization A = QR, where

Q\in \mathbb{C}^{m\times m} is unitary
R is upper trapezoidal, i.e., R = \left[\begin{smallmatrix} R_1 \\ 0\end{smallmatrix}\right], where R_1\in\mathbb{C}^{n\times n} is upper triangular.

Partitioning Q = [Q_1~Q_2], where Q_1\in\mathbb{C}^{m\times n} has orthonormal columns, gives A = Q_1R_1, which is the reduced, economy size, or thin QR factorization.

Cost: 2(n^2m-n^3/3) flops for Householder QR factorization. The explicit formation of Q (which is not usually necessary) requires a further 4(m^2n-mn^2+n^3/3) flops.

Uses:

computing eigenvalues of large matrices
numerically stable matrix inversion
computing an orthonormal basis for the range space of A
orthogonalization
solving least squares problems

Theorem: If A is an m\times n matrix, m \geq n with linearly independent column vectors, then A can be factored as A=QR where Q is an m\times n matrix with orthonormal column vectors (i.e. QQ^T=I), and R is an n\times n invertible upper triangular matrix.

QR can also be applied to under-determined systems (m < n), by first applying it on A^T, and performing a few extra steps.

Eigenvalue Decomposition

Let A be an n \times n matrix. A number \lambda is an eigenvalue if there exists a nonzero solution vector x such that Ax = \lambda x. Rearranging, we get (A-\lambda I)x = 0.

To find a nonzero solution x we must solve \det(A-\lambda I) = 0. Finding this determinant gives us an nth-degree polynomial in \lambda, called the characteristic equation of A. The eigenvalues are the roots of this equation. The eigenvector corresponding to an eigenvalue \lambda is obtained by applying Gaussian elimination to (A-\lambda I | 0).

Theorems:

If A is a triangular matrix (upper triangular, lower triangular, or diagonal) then the eigenvalues of A are the entries on the main diagonal
A is invertible if and only if \lambda = 0 is not an eigenvalue of A.

Diagonalization

The n \times n matrix A is diagonalizable if an n\times n nonsingular (i.e. invertible) matrix P can be found so that P^{-1}AP = D is a diagonal matrix. We can say A is diagonalized, or P diagonalises A.

For a symmetric matrix A with real entries, we can always find an orthogonal matrix P that diagnolizes A: P^T A P = D

Theorems:

A is diagonlizable iff it has n linearly independent eigenvectors K_1, K_2, \ldots, K_n,.
If an n \times n matrix has n distinct eigenvalues, then it is diagonalizable.

If there are repeated eigenvalues then we need to find the eigenspace of the eigenvalues, E_\lambda. This is the subspace of all x such that (A-\lambda I )x = 0.

An n \times n matrix is diagonalizable if and only if for each e-value \lambda, \dim(E_\lambda) = \text{multiplicity of the e-value }\lambda. The (algebraic) multiplicity of an evalue is the number of occurences of that value. The dimension of the e-space is simply the number of independent vectors in (A-\lambda I).

The trace of A is equal to the sum of its eigenvalues: \mathrm{tr}(A) = \mathrm{tr}(D) = \sum_i \lambda_i. The determinant of A is the product of its eigenvalues: \mathrm{det}(A) = \prod_i \lambda_i.

Methodology

To diagonalize a matrix A:

First find its eigenvalues.
If the eigenvalues are distinct then we know that A can be diagonalized.
If there are repeated e-values, then we’re not sure if A can be diagonalized. We can carry on and try find the evectors or we can do the following test (question might say what to do).
- Find the e-space for the repeated e-value
- Find the dimension of the e-space
  - the number of free variables describing the basis of the e-vector OR
  - number of columns minus the number of independent vectors in (A-\lambda I)
- If this dimension equals the multiplicity, then the matrix can be diagonalized, else stop.
Find the eigenvectors for each eigenvalue.
- if the number of evectors is less than \dim(A), then the matrix cannot be diagonalized

Eigenbasis: An eigenbasis is a basis where every vector is an eigenvector.

Singular Value Decomposition

Any matrix A \in \mathbb{C}^{m \times n} has a singular value decomposition (SVD) A = U\Sigma V^* , where

p = \min(m,n)
\Sigma = \mathrm{diag}(\sigma_1,\sigma_2,\dots,\sigma_p) \in \mathbb{R}^{m\times n}
\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_p\ge0
U\in\mathbb{C}^{m\times m} is unitary: columns are the left singular vectors of A
V\in\mathbb{C}^{n\times n} is unitary: columns are the right singular vectors of A

Extends diagonalization to general m\times n matrices. The \sigma_i are the singular values of A, and they are the nonnegative square roots of the p largest eigenvalues of A^* A. The rank of A is equal to the number of nonzero singular values. If A is real, U and V can be taken to be real.

The essential SVD information is contained in the compact or economy size SVD A = U\Sigma V^*, where U\in\mathbb{C}^{m\times r}, \Sigma = \mathrm{diag}(\sigma_1,\dots,\sigma_r), V\in\mathbb{C}^{n\times r}, and r = \mathrm{rank}(A).

Cost: 14mn^2+8n^3 for P(:,1\colon n), \Sigma, and Q by the Golub–Reinsch algorithm, or 6mn^2+20n^3 with a preliminary QR factorization.

Uses:

determining matrix rank
solving rank-deficient least squares problems
computing all kinds of subspace information.

Relation to eigenvalue decomposition: A^*A = (U\Sigma V^*)^* (U\Sigma V^*) = V \Sigma^* U^* U \Sigma V^* = V (\Sigma^* \Sigma) V^* AA^* = (U\Sigma V^*)(U\Sigma V^*)^* = U \Sigma V^* V \Sigma^* U^* = U (\Sigma^* \Sigma) U^* The RHS is the EVD of the LHS.

The SVD allows to describe the effect of a matrix A on a vector (via the matrix-vector product), as a three-step process A=U\Sigma V^*:

A first rotation in the input space (V)
A simple positive scaling that takes a vector in the input space to the output space (\Sigma)
And another rotation in the output space (U)

U, V are rotation matrices (U^*U = V^*V = I)

EVD vs SVD Intuition

Consider the EVD A=PDP^{-1} and SVD A = U\Sigma V^*. Some key differences:

Orthogonality
- EVD: the vectors in P are not necessarily orthogonal, so the change of basis isn’t a simple rotation.
- SVD: the vectors in U and V are orthonormal, so they do represent rotations (and possibly flips).
Invertibility
- EVD: P and P^{-1} are inverses of each other.
- SVD: U and V are not necessarily the inverse of one another. Usually unrelated.
Entries of the diagonal matrix
- EVD: the entries of D can be any complex number - negative, positive, imaginary, whatever.
- SVD: the entries of \Sigma are all real and nonnegative.
Existence
- EVD: only exists for square matrices, and even among square matrices sometimes it doesn’t exist.
- SVD: always exists for any sort of rectangular or square matrix.
EVD represents A in terms of how it scales vectors it doesn’t rotate:
- x^TA = y^T
- x=Ay
- Ax=\lambda x
SVD represents A in terms of corresponding vectors that are scaled the same, whether moving from the left to the right space or vice-versa. When the left and right space are the same (i.e. when A is square), SVD represents A in terms of how it rotates and reflects vectors that A and A^T scale by the same amount.
- x^TA = \sigma y^T
- \sigma x=Ay
EVD and SVD are the same when A is symmetric positive semi-definite (i.e. A satisfies A = B^* B).

Spectral Theorem

A generalization of diagonalization to different mathematical objects. If A is a normal matrix (and so square and symmetric), the spectral theorem says that it can be written A = UDU^* , where

U is a unitary matrix
D is a diagonal matrix of the eigenvalues of A; these are all real.

The spectral decomposition is a special case of the Schur decomposition. When A is also positive semi-definite, this decomposition is also an SVD.

Cost: 9n^3 for Q and D by the QR algorithm, or 4n^3\!/3 flops for D only.
Use: any problem involving eigenvalues of Hermitian matrices.

Moore–Penrose Inverse

A generalization of the matrix inverse. Also referred to as the pseudoinverse.

The pseudoinverse is defined and unique for all real or complex matrices.

For A \in \mathbb{K}^{m\times n}, a pseudoinverse of A is defined as a matrix A^+ \in \mathbb{K}^{n\times m} satisfying all of the following (the Moore–Penrose conditions):

AA^+A = A
A^+AA^+ = A^+
(AA^+)^* = AA^+ i.e. is Hermitian
(A^+A)^* = A^+A i.e. is Hermitian

Properties

if A is invertible, then A^+ = A^{-1}
when A has linearly independent columns, then A^+ can be computed (using the left inverse) as A^+ = (A^*A)^{-1}A^*.
when A has linearly independent rows, then A^+ can be computed (using the right inverse) as A^+ = A^*(A^*A)^{-1}.

Calculated using the SVD: if A = U\Sigma V^*, then A^+ = V\Sigma^+ U^*.

It can be used to find the the least squares solution or the minimum norm solution to a system of linear equations with no unique solution.

Inner Product Spaces

Definition: An inner product on a vector space V \subseteq \mathbb{R}^n is a function that associates a real number \langle u,v \rangle with every pair of vectors in V such that \forall u,v,w \in V and \forall k \in \mathbb{R}:

\langle u,v \rangle = \langle v,u \rangle
\langle u+v,w \rangle = \langle u,w \rangle + \langle v,w \rangle
\langle ku,v \rangle = k\langle u,v \rangle
\langle v,v \rangle \geq 0, and \langle v,v \rangle=0 \iff 0

The dot product naturally satisfies these axioms. The Euclidean Space is the space with the dot product as the inner product.

The norm of v \in V is denoted by \|v\| = \sqrt{\langle v,v \rangle}. The distance between u and v would be \|u-v\|.

The geometry of the vector space is thus dependent on the inner product used.

Examples of inner products on:

\mathbb{R}^n: \langle u,v \rangle = Au \cdot Av = v^TA^TAu
M_{nn}: \langle U,V \rangle = \mathrm{tr}(U^TV)
P_n: \langle p,q \rangle = a_0b_0 + a_1b_1 + \ldots + a_nb_n, where
- p = a_0 + a_1x + \ldots + a_nx^n
- q = b_0 + b_1x + \ldots + b_nx^n
C[a,b]: \langle f,g \rangle = \int_{a}^{b}f(x)g(x)dx

Gram-Schmidt Process

Suppose S is a set of two or more vectors in a real inner product space. S is orthogonal if all pairs of distinct vectors are orthogonal. If each vector also has a norm of 1, then S is orthonormal.

If S orthogonal consists of nonzero vectors only, then S is linearly independent. A basis consisting of orthonormal vectors is an orthonormal basis.

Orthogonal Basis Coordinate Theorem: If S = \{v_1, v_2,\ldots,...,v_n\} is an orthogonal basis for an inner product space V, and if u \in V, then u = \frac{\langle u,v_1 \rangle}{\|v_1\|^2}v_1 + \frac{\langle u,v_2 \rangle}{\|v_2\|^2}v_2 + \ldots + \frac{\langle u,v_n \rangle}{\|v_n\|^2}v_n
Projection Theorem: If W is a finite-dimensional subspace of an inner product space V, then \forall u \in V, u = w_1 + w_2, where w_1 \in W (orthogonal projection of u onto W) and w_2 \in W^\perp.

A vector in V can be orthogonally projected onto W using the Orthogonal Basis Coordinate Theorem.

Every nonzero finite-dimensional inner product space has an orthonormal basis.

Gram-Schmidt Process:: The process used to construct an orthogonal basis from a given basis.

Example: The Legendre Polynomials are the polynomials in the orthogonal basis constructed from the standard basis \{ 1,x,x^2,\ldots,x^n\} for the vector space P_n with inner product \langle p, q \rangle = \int_{-1}^{1}p(x)q(x) dx

Norms

Given a vector space V over a subfield F of the complex numbers \mathbb {C}, a norm on V is a real-valued function p with the following properties:

subadditivity: p(x+y)\leq p(x)+p(y) for all x,y\in V
absolute homogeneity: p(sx)=|s|p(x) for all x\in V and all scalars s
positive definiteness: for all x\in V, if p(x) = 0 then x=0

For a matrix:

\|A\|_1 is the maximum of the column absolute sums of A
\|A\|_\infty is the maximum of the row absolute sums of A
\|A\|_2 is the largest singular value of A

Basics

Systems of Equations

Dot Products

Cross Products

Determinants

Invertible matrix

Change of Basis

Nullspace (kernel)

Rank

Matrix Definiteness

Complex vector spaces

Matrix Types

Summary

Orthogonal Matrix

Unitary Matrix

Hermitian Matrix

Normal Matrix

Matrix Decompositions / Factorizations

Summary

LU Decomposition

Cholesky Decomposition

Schur Decomposition

QR Decomposition

Eigenvalue Decomposition

Diagonalization

Methodology

Singular Value Decomposition

EVD vs SVD Intuition

Spectral Theorem

Moore–Penrose Inverse

Inner Product Spaces

Gram-Schmidt Process

Norms

References