Linear Algebra

Basics

Systems of Equations

Dot Products

Any line in \mathbb{R}^2 can be described by the dot product: all x such that a \cdot x = c where a, x are vectors and c \in \mathbb{R}. This is because a_1x + a_2y = c describes a line.

The dot product is zero if and only if the two vectors are orthogonal. On \mathbb{R}^2, this corresponds to two lines going through the origin that are perpendicular to each other.

We can think of a \cdot b as measuring the extent to which a and b point in the same direction. The dot product is positive if a and b point in the same general direction, 0 if they are perpendicular, and negative if they point in generally opposite directions.

Cross Products

Given a, b \in \mathbb{R}^3, their cross product, a \times b \in \mathbb{R}^3, is a vector that is

So, if the vectors a and b are parallel the cross product is the zero vector.

The cross product is defined by the formula a \times b = \|a\| \|b\| \sin(\theta) n

where:

Determinants

Can be computed from a square matrix. Intuitively represents the scaling factor of the transformation described by the matrix.

If a matrix A has a non-zero determinant, then it is invertible.

Invertible matrix

Let A be a square n \times n matrix. Then the following are equivalent:

If any of these above conditions do not hold, then A maps the point to at most n-1 dimensions. For example, if A was a 3 \times 3 matrix, then Ax = y would be in one or two dimensions (for example, a projection from 3-d space to 2-d). It is thus not possible to find an inverse transform to go from y to x.

Change of Basis

A basis for a vector space is a linearly independent set spanning the vector space.

If A is a matrix whose columns comprise a basis of \mathbb{R}^n, then a vector v (in the standard basis) is the vector A^{-1}v in the basis of A.

Nullspace (kernel)

The kernel of a linear map L: V \rightarrow W between two vector spaces V and W is the set of all elements v \in V such that L(v) = 0. That is, \ker(L) = \{v \in V | L(v) = 0\}.

The kernel L is a linear subspace of the domain V. The dimension of the kernel of L is called the nullity.

The kernel of matrix can be computed by solving for x in A \cdot x = 0.

Rank

The rank of a matrix A is the dimension of the vector space generated (or spanned) by its columns. This is the same as the dimension of the space spanned by its rows.

Can be computed by counting the number of non-zero rows in its reduced row-echelon form.

The rank is also defined as the dimension of the image of L. The rank-nullity theorem states \dim(\ker L) + \dim(\operatorname{im} L) = \dim(V).

Matrix Definiteness

A matrix is positive definite, i.e. A > \mathbf{0}, if x^TAx>0. Semi-definite if \geq. Negative follows with the <,\leq comparisons.

A matrix is indefinite if it is not positive semi-definite AND not negative semi-definite.

A matrix is positive definite if all eigenvalues are positive.

Complex vector spaces

Scalars are complex numbers, and vectors and matrices consist of complex numbers. All the usual complex number rules and vector space rules apply.

Theorem:

Matrix Types

Conjugate Transpose
If A is a complex matrix, then the conjugate transpose of A is A^* = \bar{A}^T.

The conjugate transpose has the following properties:

Summary

A square complex matrix A is said to be

A square matrix A:

Orthogonal Matrix

An orthogonal matrix is a square matrix whose columns and rows are orthonormal vectors, i.e. A^TA = AA^T = I.

That is, A^T = A^{-1}.

Over the field of complex numbers, the matrix is orthogonal iff A^*A = AA^* = I, where ^* denotes the conjugate transpose.

These matrices preserve norms (i.e., has determinant 1). Orthogonal matrices don’t decrease numerical precision (i.e. they preserve round-off error).

Theorems:

Unitary Matrix

A \in \mathbb{C}^{n\times n} is unitary iff A^{-1} = A^*. Unitary matrices generalize orthogonal matrices.

If A is a unitary matrix, then

Hermitian Matrix

A \in \mathbb{C}^{n\times n} is Hermitian iff A^* = A Hermitian matrices generalize real symmetric matrices.

The eigenvalues of a Hermitian are real. For Hermitian A, the eigenvectors from different eigenvalues are orthogonal. Example: if Av_1 = \lambda_1 v_1, Av_2 = \lambda_2 v_2, then v_1 \cdot v_2 = 0.

Normal Matrix

A \in \mathbb{C}^{n\times n} is Hermitian iff A^*A=AA^*.

All unitary, Hermitian, and skew-Hermitian matrices are normal.

Matrix Decompositions / Factorizations

Summary

Eigenvalue decomposition

Hessenburg decomposition

Schur decomposition

Theorem
If A is an m\times n matrix, then:

LU Decomposition

Any matrix A \in \mathbb{C}^{n\times n} has an LU factorization PA = LU, where

We can take P = I if the leading principal submatrices of A are nonsingular, but to guarantee that the factorization is numerically stable we need A to have particular properties, such as diagonal dominance. In practical computation we normally choose P using the partial pivoting strategy, which almost always ensures numerical stability.

Can be used to efficiently compute solutions x in Ax=b:

LU decompositions are not unique. A square matrix A can be factored into LU form if it can be reduced to reduced row echelon form without row interchanges.

LDU factorization decomposes the matrix A uniquely as A = LDU, where L, U are lower and upper triangular matrix with 1s on the main diagonal, and D is a diagonal matrix.

PLU-factorization is a related technique where the reduction to row echelon form is achieved by using the permutation matrix P^{-1} so that A = PLU.

Cholesky Decomposition

Every Hermitian positive definite matrix A \in \mathbb{C}^{n\times n} has a unique Cholesky factorization A = U^*U where U \in \mathbb{C}^{n\times n} is upper triangular with positive diagonal.

Schur Decomposition

Any matrix A\in\mathbb{C}^{n\times n} has a Schur decomposition A = QTQ^*, where Q is unitary and T is upper triangular. The eigenvalues of A appear on the diagonal of T. For each k, the leading k columns of Q span an invariant subspace of A.

For real matrices, a special form of this decomposition exists in which all the factors are real. An upper quasi-triangular matrix R is a block upper triangular with whose diagonal blocks R_{ii} are either 1\times1 or 2\times2. Any A\in\mathbb{R}^{n\times n} has a real Schur decomposition A = Q R Q^T, where Q is real orthogonal and R is real upper quasi-triangular with any 2\times 2 diagonal blocks having complex conjugate eigenvalues.

Cost: 25n^3 flops for Q and T (or R) by the QR algorithm; 10n^3 flops for T (or R) only.

Use:

QR Decomposition

Any matrix A \in \mathbb{C}^{m\times n} with m\ge n has a QR factorization A = QR, where

Partitioning Q = [Q_1~Q_2], where Q_1\in\mathbb{C}^{m\times n} has orthonormal columns, gives A = Q_1R_1, which is the reduced, economy size, or thin QR factorization.

Cost: 2(n^2m-n^3/3) flops for Householder QR factorization. The explicit formation of Q (which is not usually necessary) requires a further 4(m^2n-mn^2+n^3/3) flops.

Uses:

Theorem
If A is an m\times n matrix, m \geq n with linearly independent column vectors, then A can be factored as A=QR where Q is an m\times n matrix with orthonormal column vectors (i.e. QQ^T=I), and R is an n\times n invertible upper triangular matrix.

QR can also be applied to under-determined systems (m < n), by first applying it on A^T, and performing a few extra steps.

Eigenvalue Decomposition

Let A be an n \times n matrix. A number \lambda is an eigenvalue if there exists a nonzero solution vector x such that Ax = \lambda x. Rearranging, we get (A-\lambda I)x = 0.

To find a nonzero solution x we must solve \det(A-\lambda I) = 0. Finding this determinant gives us an nth-degree polynomial in \lambda, called the characteristic equation of A. The eigenvalues are the roots of this equation. The eigenvector corresponding to an eigenvalue \lambda is obtained by applying Gaussian elimination to (A-\lambda I | 0).

Theorems:

Diagonalization

The n \times n matrix A is diagonalizable if an n\times n nonsingular (i.e. invertible) matrix P can be found so that P^{-1}AP = D is a diagonal matrix. We can say A is diagonalized, or P diagonalises A.

For a symmetric matrix A with real entries, we can always find an orthogonal matrix P that diagnolizes A: P^T A P = D

Theorems:

If there are repeated eigenvalues then we need to find the eigenspace of the eigenvalues, E_\lambda. This is the subspace of all x such that (A-\lambda I )x = 0.

An n \times n matrix is diagonalizable if and only if for each e-value \lambda, \dim(E_\lambda) = \text{multiplicity of the e-value }\lambda. The (algebraic) multiplicity of an evalue is the number of occurences of that value. The dimension of the e-space is simply the number of independent vectors in (A-\lambda I).

The trace of A is equal to the sum of its eigenvalues: \mathrm{tr}(A) = \mathrm{tr}(D) = \sum_i \lambda_i. The determinant of A is the product of its eigenvalues: \mathrm{det}(A) = \prod_i \lambda_i.

Methodology

To diagonalize a matrix A:

Eigenbasis
An eigenbasis is a basis where every vector is an eigenvector.

Singular Value Decomposition

Any matrix A \in \mathbb{C}^{m \times n} has a singular value decomposition (SVD) A = U\Sigma V^* , where

Extends diagonalization to general m\times n matrices. The \sigma_i are the singular values of A, and they are the nonnegative square roots of the p largest eigenvalues of A^* A. The rank of A is equal to the number of nonzero singular values. If A is real, U and V can be taken to be real.

The essential SVD information is contained in the compact or economy size SVD A = U\Sigma V^*, where U\in\mathbb{C}^{m\times r}, \Sigma = \mathrm{diag}(\sigma_1,\dots,\sigma_r), V\in\mathbb{C}^{n\times r}, and r = \mathrm{rank}(A).

Cost: 14mn^2+8n^3 for P(:,1\colon n), \Sigma, and Q by the Golub–Reinsch algorithm, or 6mn^2+20n^3 with a preliminary QR factorization.

Uses:

Relation to eigenvalue decomposition: A^*A = (U\Sigma V^*)^* (U\Sigma V^*) = V \Sigma^* U^* U \Sigma V^* = V (\Sigma^* \Sigma) V^* AA^* = (U\Sigma V^*)(U\Sigma V^*)^* = U \Sigma V^* V \Sigma^* U^* = U (\Sigma^* \Sigma) U^* The RHS is the EVD of the LHS.

The SVD allows to describe the effect of a matrix A on a vector (via the matrix-vector product), as a three-step process A=U\Sigma V^*:

  1. A first rotation in the input space (V)
  2. A simple positive scaling that takes a vector in the input space to the output space (\Sigma)
  3. And another rotation in the output space (U)

U, V are rotation matrices (U^*U = V^*V = I)

EVD vs SVD Intuition

Consider the EVD A=PDP^{-1} and SVD A = U\Sigma V^*. Some key differences:

Spectral Theorem

A generalization of diagonalization to different mathematical objects. If A is a normal matrix (and so square and symmetric), the spectral theorem says that it can be written A = UDU^* , where

The spectral decomposition is a special case of the Schur decomposition. When A is also positive semi-definite, this decomposition is also an SVD.

Moore–Penrose Inverse

A generalization of the matrix inverse. Also referred to as the pseudoinverse.

The pseudoinverse is defined and unique for all real or complex matrices.

For A \in \mathbb{K}^{m\times n}, a pseudoinverse of A is defined as a matrix A^+ \in \mathbb{K}^{n\times m} satisfying all of the following (the Moore–Penrose conditions):

Properties

Calculated using the SVD: if A = U\Sigma V^*, then A^+ = V\Sigma^+ U^*.

It can be used to find the the least squares solution or the minimum norm solution to a system of linear equations with no unique solution.

Inner Product Spaces

Definition
An inner product on a vector space V \subseteq \mathbb{R}^n is a function that associates a real number \langle u,v \rangle with every pair of vectors in V such that \forall u,v,w \in V and \forall k \in \mathbb{R}:

The dot product naturally satisfies these axioms. The Euclidean Space is the space with the dot product as the inner product.

The norm of v \in V is denoted by \|v\| = \sqrt{\langle v,v \rangle}. The distance between u and v would be \|u-v\|.

The geometry of the vector space is thus dependent on the inner product used.

Examples of inner products on:

Gram-Schmidt Process

Suppose S is a set of two or more vectors in a real inner product space. S is orthogonal if all pairs of distinct vectors are orthogonal. If each vector also has a norm of 1, then S is orthonormal.

If S orthogonal consists of nonzero vectors only, then S is linearly independent. A basis consisting of orthonormal vectors is an orthonormal basis.

Orthogonal Basis Coordinate Theorem
If S = \{v_1, v_2,\ldots,...,v_n\} is an orthogonal basis for an inner product space V, and if u \in V, then u = \frac{\langle u,v_1 \rangle}{\|v_1\|^2}v_1 + \frac{\langle u,v_2 \rangle}{\|v_2\|^2}v_2 + \ldots + \frac{\langle u,v_n \rangle}{\|v_n\|^2}v_n
Projection Theorem
If W is a finite-dimensional subspace of an inner product space V, then \forall u \in V, u = w_1 + w_2, where w_1 \in W (orthogonal projection of u onto W) and w_2 \in W^\perp.

A vector in V can be orthogonally projected onto W using the Orthogonal Basis Coordinate Theorem.

Every nonzero finite-dimensional inner product space has an orthonormal basis.

Gram-Schmidt Process:
The process used to construct an orthogonal basis from a given basis.

Example: The Legendre Polynomials are the polynomials in the orthogonal basis constructed from the standard basis \{ 1,x,x^2,\ldots,x^n\} for the vector space P_n with inner product \langle p, q \rangle = \int_{-1}^{1}p(x)q(x) dx

Norms

Given a vector space V over a subfield F of the complex numbers \mathbb {C}, a norm on V is a real-valued function p with the following properties:

For a matrix:

References