Numerical Methods

Solving Nonlinear Equations

Here we look at the problem of root-finding.

Suppose we want to solve f(x) = 0, that is, we want x = f^{-1}(0).

If our function is not of this form, we can often transform it so that it is.

Typical numerical methods for solving such equations are iterative and approach the true solution in the limit. We can stop once the result is close enough.

Stopping criteria

Rootfinding problems are well-conditioned if the graph of f strikes the axis with a nice large gradient.

If the problem is ill-conditioned then:

Rate of convergence

A method converges with order p if e_{i+1} = ce_i^p, where e_i = |x_i - x| is the error in the ith iteration, x is the true root and x_i is the approximation.

Bisection

A bracketing method. Involves dividing the interval into smaller and smaller intervals that contain the root.

  1. Find interval [a,b] where f(x) changes sign
  2. Compute f(c) where c = (a+b)/2 is midpoint of interval
  3. Compare signs of f(a),f(b),f(c) and repeat with interval of half-size

Has a convergence rate of around order 1.

Positives:

Negatives:

False Position (regula falsi)

c = \frac{af(b) - bf(a)}{f(b)-f(a)}

Newton-Raphson Method

A derivative method.

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

Has a convergence rate of around order 2.

Positives:

Negatives:

The multidimensional Newton step for x, f(x) \in \mathbb{R}^m is x_{n+1} = x_n - J(x)^{-1} f(x), where J is the m\times m Jacobian matrix of f.

Secant Method

Replaces the derivative in the Newton iteration with the approximation f'(x_n) \approx \frac{f(x_n)-f(x_{n-1})}{x_n - x_{n-1}}

Has a convergence rate of around order 1.6.

Positives:

Negatives:

Brent’s Method

Also known as the Brent-Dekker method. A hybrid method that combines the bisection method, the secant method and inverse quadratic interpolation. It tries to use the potentially fast-converging secant method or inverse quadratic interpolation if possible, but it falls back to the more robust bisection method if necessary.

Positives:

Gauss–Newton

Can only be used to minimize a sum of squared. Second-derivatives are not required.

Fixed-Point Iteration

Used when we wish to solve a function of the form f(x)=x. We then use x_{n+1} = f(x_n).

[Could also use x_{n+1} = 0.5(x_n + f(x_n)). ]

Positives:

Negatives:

Systems of Non-linear Equations

Let \underline{x} now represents a vector of length m: (x_1,\ldots,x_m).

Suppose we have m equations f_1(\underline{x}) = 0, f_2(\underline{x}) = 0,\ldots,f_m(\underline{x}) = 0. We choose to represent this system of functions by F.

So, our notation now reads F(\underline{x}) = \underline{0}.

Newton’s Method

\underline{x}_{n+1} = \underline{x}_n - [F'(\underline{x}_n)]^{-1} F(\underline{x}_n) F'(\underline{x}_n) = J(\underline{x}) represents the Jacobian of F J^{-1} F(\underline{x}_n) should be calculated using left-division if possible.

Broyden’s Method

A quasi-Newton method for finding roots of a system of equations.

The multivariate Newton’s method uses the Jacobian matrix at every iteration (which is expensive to calculate). Broyden’s method computes the whole Jacobian only at the first iteration and does rank-one updates at other iterations.

Summary

Solving Systems of Linear Equations

Gaussian Elimination

Suppose we wish to solve Ax = b.

Gaussian elimination (GE) can be used to change an arbitrary matrix into an upper triangular form (from where you can use back substitution).

In GE, we should avoid zero or small pivots as this can lead to round-off error accumulating. GE would thus be an unstable algorithm. Zero or small pivots can be avoided by row interchanges (“pivoting”). The rows are interchanged so that the first column is arranged in decreasing order of absolute values (partial pivoting). If columns are interchanged as well, then this is called complete pivoting.

For an n \times n matrix, GE takes O(n^3) operations.

Tridiagonal Matrix (Thomas) Algorithm

Thomas’ algorithm is not stable in general, but is so in several special cases, such as when the matrix is diagonally dominant (either by rows or columns) or symmetric positive definite.

If stability is required in the general case, Gaussian elimination with partial pivoting (GEPP) is recommended instead.

LU Decomposition

GE (without pivoting) is equivalent to factoring A into a lower-triangular (L) and an upper-triangular (U) matrix, A=LU.

We can solve:

With pivoting, the factorization becomes PA = LU, where P is the permutation matrix that interchanges the rows. So, we have Ax = b \implies PAx = Pb.

Thus, we have y = L^{-1}Pb, and x = U^{-1}y

We first solve for y in the above, then solve for x. Because L and U are lower and upper triangular, we can use the (computationally fast) forward and back substitution, respectively. This means if we need to repeatedly solve a system involving A, then we can compute the LU factorization once and reuse each time. This makes the code faster.

The asymptotic complexity for computing A = LU is \frac{2}{3}n^3 FLOPS. Forward/back solving with L or U only requires 2n^2 FLOPS (and are thus faster).

The explicit computation of the inverse A^{-1} is not recommended:

An n\times n tridiagonal system can be solved using quickly using GE, or the Thomas/TDMA algorithm. A sparse left-division algorithm can be used if such an algorithm is not available.

Jacobi Iterative Method

Iterative techniques are useful for solving large systems. Other methods might be computationally expensive.

To solve Ax = b, we can split A into A = D + L + U, i.e. a diagonal matrix, and lower and upper triangular matrices. The sytem can be written as Dx = b - (L+U)x, and if D^{-1} exists (i.e. no 0s on the diagonal), then x = D^{-1}[b - (L+U)x].

Now, given an initial guess x_0, we can find approximations x_k = D^{-1}[b - (L+U)x_{k-1}], for k = 1,2,\ldots. This iteration is guaranteed to converge if the system is diagonally dominant.

Gauss–Seidel Iterative Method

We can also write the system as (D+L)x = b - Ux, and if H = (D+L)^{-1} exists then x = H(b-Ux) Now, given an initial guess x_0, we can find approximations x_k = H(b-Ux_{k-1}) for k = 1,2,\ldots.

Summary

Condition Number

Defined to be the maximum ratio of the relative error in x to the relative error in b, for the linear equation Ax=b. Calculated as \|A\|\|A^{-1}\|, where \|A\| is the norm of A.

If the condition number is much greater than 1, then the system is ill-conditioned. This means that small relative changes in data may produce large relative changes in the solution.

As rule of thumb, if the condition number is of the order 10^c, then you should expect to lose c digits of accuracy. Example, for float-point arithmetic (machine epsilon of approx 1e16), if the condition number of the system is 1e4, then you would only have 1e12 accuracy in the solution.

The definition of the condition number depends on the choice of norm (i.e. how you choose to measure the error). If \|\cdot \| is the L_2-norm \|\cdot \|_{2}, then \kappa(A) = \sigma_{\mathrm{max}}(A) / \sigma_{\mathrm{min}}(A), where \sigma_i are the singular values of A and we take the maximal and minimal ones.

Alternating Direction Implict Method

https://en.wikipedia.org/wiki/Alternating-direction_implicit_method

Curve Fitting

Polynomials

An n-degree polynomial, written as a_nx^n + a_{n-1}x^{n-1} \ldots + a_1x^1 + a_0 has n+1 coefficients.

Given m = n+1 points, we can fit a unique polynomial that goes through all the points.

If we have more than n+1 points, the system is overdetermined, and we have to find the line/curve of best fit.

If the polynomial degree n=1, the problem of finding this best fit line is called linear regression.

We first parametrize the line as y(x)=a_1x + a_0. Given the co-ordinates in the plane (x_i, y_i) of each of the m points, we can write the error (or residual) of each point as the difference between the line and each point: r_i = y(x_i) - y_i = \text{expected} - \text{actual}. This can be visualized as the vertical distance between the line and the point.

To find the best fit line, we simply find the sum of squared errors: \sum_{i}r_i^2 = \sum_{i}\left(y(x_i) - y_i\right)^2 = \sum_{i}(a_1x_i + a_0 -y_i)^2, and minimize it w.r.t. a_0 and a_1. That is, we try to find the line that minimizes the errors. To minimize, we differentiate wrt a_0 and a_1, getting a system of equations called the normal equations.

The above can be extended to fit polynomials of any degree.

coefs = polyfit(x,y,n)    # n-degree polynomial
y_fit = polyval(coefs, x) # to evaluate the polynomial

Linearization

Functions other than polynomials, such as exponentials and hyperbolas can be linearized. This means that these functions can be transformed into linear equation.

y = be^{mx} \implies \ln y = \ln b + mx y = \frac{1}{mx + b} \implies \frac{1}{y} = b + mx

Regression

// TODO: consider making this into a numerical LA section.

Recall that we want to find \hat{x} that minimizes: \| Ax - b \|_2

Another way to think about this is that we are interested in where vector b is closest to the subspace spanned by A. This is the projection of b onto A. Since b - A\hat{x} must be perpendicular to the subspace spanned by A, we see that A^T(b - A\hat{x}) = 0.

This leads us to the normal equations: A^T Ax = A^T b x = (A^T A)^{-1} A^T b

If A has full rank, the pseudo-inverse (A^TA)^{-1}A^T is a square, hermitian positive definite matrix. The standard way of solving such a system is Cholesky Factorization, which finds upper-triangular R s.t. A^TA = R^TR.

A^T A x = A^T b R^T R x = A^T b R^T w = A^T b R x = w

QR Factorization

A x = b A = Q R Q R x = b R x = Q^T b

SVD: gives the pseudo-inverse

A x = b A = U \Sigma V \Sigma V x = U^T b \Sigma w = U^T b x = V^T w

Normal equations/Cholesky is fastest when it works. Cholesky can only be used on symmetric, positive definite matrices. Also, normal equations/Cholesky is unstable for matrices with high condition numbers or with low-rank.

Linear regression via QR has been recommended by numerical analysts as the standard method for years.

Interpolation

Polynomial Interpolation

The conditions that the polynomial have to satisfy can be expressed in different bases:

Lagrange Polynomials

These are used for polynomial interpolation. Given m=n+1 distinct points, the n-degree Lagrange polynomial is defined as P_n(x) = \sum_{i}L_i(x)y_i, where L_i(x) = \prod_{j\neq i} \frac{x-x_j}{x_i-x_j}.

Alternatives

Newton Polynomials

P_n(x) = \sum_{i}a_i n_i(x), with the Newton basis polynomials defined as n_i(x) = \prod_{j=0}^{i-1} (x-x_j).

The coefficients are defined as a_{j} =[y_{0},\ldots ,y_{j}], where [\cdot] is the notation for divided differences.

Hermite Interpolation

If you have

That is, we have available a set of (n+1)(m+1) values y_j^{(i)}=f^{(i)}(x_j), ;(j=0,\cdots,n,\;i=0,\cdots,m)

then the function can be interpolated by a polynomial of degree (n+1)(m+1)-1: H_{(n+1)(m+1)-1}(x)=\sum_{k=0}^{(n+1)(m+1)-1} c_kx^k

The coefficients will be the solution of a system of equations that can be set up from the above. This is however computationally expensive. In practice, more efficient algorithms are used.

Piecewise Interpolation

Splines are piecewise polynomials.

Linear Splines

Given points/coordinates (x_i, y_i), \quad i=1,2,\ldots,n we can draw straight lines f_i(x)=m_i(x)+c_i, \quad x_i\leq x\leq x_{i+1}, \quad i=1,2,\ldots,n-1 through each adjacent pair of points. This is called a linear spline, f(x). We can find the equation of each line using f_{i}(x_i)=y_i \quad \text{and} \quad f_{i}(x_{i+1}) = y_{i+1}, \quad i=1,\ldots,n-1.

Linear Spline

This has O(h^2) error, where h is the distance between interpolation points.

Cubic Splines

Given points/coordinates (x_i, y_i), \quad i=1,2,\ldots,n we can draw cubic functions f_i(x)=a_ix^3 + b_ix^2 + c_ix + d_i, \quad x_i\leq x\leq x_{i+1}, \quad i=1,2,\ldots,n-1 through each pair of adjacent points. These define the cubic spline f(x).

Cubic Spline

We need to solve for (a_i, b_i, c_i, d_i) to specify the whole of f(x). Because there are n-1 cubics, each with 4 parameters, there are 4(n-1) unknowns in total. We thus need the same number of equations:

These give 2(n-1) + (n-2) + (n-2) = 4n - 6 equations.

Since we need another 2 equations, we can use endpoint/boundary conditions:

The system of equations can now be solved. Can use backslash in MATLAB.

The cubic spline has the property that on the interval (x_1,x_n), that f(x), f'(x), f''(x) are continuous.

Be aware:

Akima Splines

The spline is the C^1 piecewise cubic function whose value on [x_i, x_{i+1}] is given by the unique polynomial that satisfies for k = i, i+1:

f(x_k) = y_k \\ f'(x_k) = s_k

For each point i we define a slope s_i as some weighted average involving gradients m_{i-2},...,m_{i+2}.

Cubic Hermite Interpolation

Piecewise cubic polynomials that satisfy Hermite interpolation conditions (PCHIP). The function values and derivatives are specified at each node point. These interpolants are in general not twice continuously differentiable.

A monotone variant is made possible by modifying the tangents m_{i} to ensure the monotonicity of the resulting spline.

Bessel Splines

A Hermite spline with the derivative at each node point x_i chosen as that of the parabola that passes through (x_{i-1}, m_{i-1}), (x_i, m_i), (x_{i+1}, m_{i+1}) s_i = \frac{(x_i - x_{i-1})m_i + (x_{i+1} - x_i)m_{i-1}}{x_{i+1}-x_{i-1}}, with m_i = \frac{y_{i+1} - y_i}{x_{i+1} - x_i}. This is the same as linearly interpolating m between the midpoint of (x_{i-1}, x_i) and (x_i ,x_{i+1}) and evaluating at x_i.

This interpolation trades off some smoothness in favour of a “dampening” effect on the spline fit.

Generalized Splines

Interpolation Tools

Hyman Filter

Hyman’s monotonicity constraint filter ensures that in the regions of local monotoniticity of the input (three successive increasing or decreasing values) the interpolating cubic remains monotonic. If the interpolating cubic is already monotonic, the Hyman filter leaves it unchanged.

In the case of C^2 interpolants the Hyman filter ensures local monotonicity at the expense of the second derivative of the interpolant which will no longer be continuous in the points where the filter has been applied. While some non-linear schemes (Modified Parabolic, Fritsch-Butland, Kruger) are guaranteed to be locally monotone in their original approximation, all other schemes must be filtered according to the Hyman criteria at the expense of their linearity.

Numerical Differentiation

For some complicated functions, it’s not so easy to write down the derivative. We can numerically evaluate the derivative however.

Finite Differences

The derivative of a function f(x) is defined as f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h)-f(x)}{h}

Three simple finite difference formula are:

For the second derivative, one could use: f''(x) \approx \frac{f'(x+h/2)-f'(x-h/2)}{h} = \frac{f(x+h)+f(x-h)-2f(x)}{h^2}

There are more systematic methods for deriving finite difference formualas with Taylor series.

Lagrange Interpolation

Partial Differentiation

Numerical Integration

Numerical integration is useful for integrals that don’t have analytic formulas or are too cumbersome to do by hand.

Rectangular

This is simply a Riemann sum. One can use the left, mid or right endpoints of the intervals to evaluate the function.

Trapezium Rule

The area of a trapezium specified by the points (a,f(a)) and (b,f(b)) is T(a,b) = \frac{1}{2}(b-a)(f(a) + f(b)).

\int_a^b f(x) dx = \sum_{i=1}^{n} \int_{x_{i-1}}^{x_i} f(x) dx \approx \sum_{i=1}^{n} T(x_{i-1},x_i). If the x-values are evenly-spaced, with h=(b-a)/n we get \int_a^b f(x) dx \approx h\left( f(a)/2 + \sum_{i=2}^{n} f(x_j) + f(b)/2 \right), where x_j = a+(j-1)h.

The trapezium rule converges with order 2, O(h^2).

Simpson’s Rule

Takes three points (a,f(a)), (b,f(b)), (c,f(c)) and fits a parabola through it. Uses the integral of the parabola P(x) \int_a^b f(x) dx \approx \int_a^b P(x) dx

With a step size h=(b-a)/2, we get \int_{a}^{b} P(x) \, dx =\tfrac{h}{3}\left[f(a) + 4f\left(\tfrac{a+b}{2}\right)+f(b)\right]. This is the Simpson’s \tfrac{1}{3}-rule.

A composite Simpson rule is obtained by applying the basic rule across pairs of intervals. Assumes an even number of intervals. \int_{a}^{b} P(x) \, dx =\tfrac{h}{3}\left[f(a) + 4 \sum_{j=2,4,\ldots}^{N}f(x_j) + 2 \sum_{j=3,5,\ldots}^{N-1}f(x_j) +f(b)\right].

Simpson’s rule converges with order 4, O(h^4).

Adaptive Simpson’s Method

Uses an estimate of the error from calculating the definite integral using Simpson’s rule. If the error exceeds a specified tolerance, the interval of integration is subdivided in two and adaptive Simpson’s method is recursively called on each subinterval.

Since composite Simpson’s will subdivide even in places where the function is well-approximated by a parabola, the adaptive Simpson’s method can be much more efficient since it will use fewer function evaluations.

Gauss Integration

The n-point Gauss rule is defined as \int_{-1}^{1} f(x) dx \approx \sum_{i=1}^{n} c_i f(x_i), where c_i are the weights and the x_i are the nodes. The 2n weights and nodes need to be estimated. To estimate these, we require that the rule integrates f(x)=x^{j}, exactly on [-1,1] for j = 0,...,2n-1. The x_i are zeros of the Legendre polynomials, and thus this integration technique is also called Gauss-Legendre quadrature.

For example, the 2-point Gauss rule is defined as \int_{-1}^{1} f(x) dx \approx f\left(\tfrac{-1}{\sqrt{3}}\right) + f\left(\tfrac{1}{\sqrt{3}}\right). The 3-point Gauss rule is defined as \int_{-1}^{1} f(x) dx \approx \tfrac{5}{9} f\left(-\sqrt{\tfrac{3}{5}}\right) + \tfrac{8}{9}f\left(0\right) + \tfrac{5}{9}f\left(\sqrt{\tfrac{3}{5}}\right).

An integral not on [-1,1] can be tranformed into \int_{a}^{b} f(x) dx =\int_{-1}^{1} f(t) dt, by substituting x=\frac{a+b+t(b-a)}{2}, with dx = \frac{b-a}{2} dt

Gauss–Kronrod Quadrature

An adaptive variant of Gaussian quadrature, in which the evaluation points are chosen so that a more accurate approximation can be computed by re-using the the evaluation points from the previous iteration. It is an example of what is called a nested quadrature rule: for the same set of function evaluation points, it has two quadrature rules, one higher order and one lower order (the latter called an embedded rule). The difference between these two approximations is used to estimate the calculational error of the integration.

If the interval [a, b] is subdivided, the usual Gauss evaluation points of the new subintervals never coincide with the previous evaluation points (except at the midpoint for odd numbers of evaluation points), and thus the integrand must be evaluated at every point. Gauss–Kronrod formulas are extensions of the Gauss quadrature formulas generated by adding n+1 points to an n-point rule in such a way that the resulting rule is of order 3n+1; the corresponding Gauss rule is of order 2n-1.

Clenshaw–Curtis or Fejer Quadrature

Employs a change of variable x=\cos \theta and uses a discrete cosine transform (DCT) approximation for the cosine series \int_{-1}^1 f(x)\,dx=\int_{0}^{\pi }f(\cos\theta )\sin(\theta )\,d\theta f(\cos \theta )=\frac{a_0}{2} + \sum_{k=1}^\infty a_k \cos(k\theta)

Equivalently, it is based on an expansion of the integrand in terms of Chebyshev polynomials. The function f(x) to be integrated is evaluated at the N extrema or roots of a Chebyshev polynomial. These values are used to construct a polynomial approximation for the function. This polynomial is then integrated exactly. In practice, the integration weights for the value of the function at each node are precomputed, and this computation can be performed in O(N\log N) time by means of FFT-related algorithms for the DCT.

Solving Ordinary Differential Equations

Initial Value Problems

Problem:

Find the function y(x) such y'(x)=f(x,y) subject to the initial condition y_0 = y(x_0) for a given function f.

Euler’s Method

y(x_{i+1}) \approx y(x_i) + hf(x_i,y(x_i)), for i=0,1,2,\ldots. This is a first order method, or Error = O(h).

Implicit Trapezium Rule

y(x_{i+1}) \approx y(x_i) + \tfrac{h}{2}\left( f(x_i,y(x_i)) + f(x_{i+1},y(x_{i+1})) \right). This method requires us to know y(x_{i+1}) before estimating y(x_{i+1}), which is a problem. We can try solve for this analytically or (more usually) numerically, or we could use an estimate for the y(x_{i+1}) on the right using the Euler method.

Modified Euler’s Method

We estimate y(x_{i+1}) from the implicit trapezium rule using one step of Euler’s method. y(x_{i+1})^E = y(x_i) + hf(x_i,y(x_i)), y(x_{i+1}) \approx y(x_i) + \tfrac{h}{2}\left( f(x_i,y(x_i)) + f(x_{i+1},y(x_{i+1})^E) \right). This is a second order method, or Error = O(h^2).

Midpoint Method

y(x_{i+0.5})^E = y(x_i) + \tfrac{h}{2}f(x_i,y(x_i)) y(x_{i+1}) \approx y(x_i) + \tfrac{h}{2} f(x_i+\tfrac{h}{2},y(x_{i+0.5})^E).

The above are low order members of the Runge-Kutta methods.

ode45 in Matlab.

Boundary Value Problems

Problem:

Find the function y(x) such y''(x) + f(x)y'(x) + g(x)y = h(x), for a < x < b, subject to boundary conditions y(a)=y_a, y(b)=y_b.

Subdivide the interval [a,b] into smaller intervals of width h. Use central difference approximations of the derivatives y', y''. Once the set of equations are simplified, it results in a tridiagonal system.

Solving Partial Differential Equations

Problem:

Find the function u(x,y) such that, for example: u_{xx}+u_{yy} = 0, subject to some boundary conditions.

A two-variable PDE can be solved by using a derivative stencil (like the 5-point stencil) on a finite-difference grid. This produces a large sparse matrix which can be solved using different techniques:

TODO

https://hplgit.github.io/INF5620/doc/pub/H14/decay/pdf/main_decay-4print-A4.pdf

References