Kalman Filters

Introduction

The Kalman Filter is an optimal estimation algorithm that infers parameters of interest from indirect, inaccurate and uncertain observations. It is recursive, so that new measurements can be processed as they arrive.

It projects measurements onto the state estimate. Finding the best estimate from noisy data amounts to filtering out the noise. The state is the hidden layer and it follows the first-order Markov chain. TODO: HMMs

Optimal:

If noise is Gaussian, the Kalman filter minimizes the mean square error of estimated parameters.
If noise is non-Gaussian, it is the best linear estimator given the mean and covariance of the noise.

Why popular?

Good results in practice due to optimality and structure
Convenient form for online real-time processing
Easy to formulate and implement with basic understanding
Measurement equations need not be inverted

Common applications:

Navigation systems (GPS + inertial measurement units)
Target tracking (radar, computer vision)
Signal processing and sensor fusion
Robotics and autonomous vehicles
Financial time series estimation

When to use it:

Variables of interest can only be measured indirectly
Multiple sensors provide measurements subject to noise
Real-time online processing is needed

Pros and Cons:

allows noise in estimates, transitions, observations
doesn’t require historical data
efficient, suitable for real-time applications
only for linear-Normal state space models (state transitions may be non-linear or the noise may be non-Normal). Variants exist to deal with this.

Notation

Symbol	Meaning	Dimension
n_x	State dimension
n_y	Observation dimension
x_k	State vector at time step k	n_x \times 1
y_k	Observation/measurement at time step k	n_y \times 1
u_k	Control input at time step k	n_u \times 1
F	State transition matrix	n_x \times n_x
G	Control input matrix	n_x \times n_u
H	Observation matrix (maps state to measurement space)	n_y \times n_x
w_k	Process noise: w_k \sim (0, Q)	n_x \times 1
v_k	Observation noise: v_k \sim (0, R)	n_y \times 1
Q	Process noise covariance matrix	n_x \times n_x
R	Observation noise covariance matrix	n_y \times n_y
P_k	Estimate error covariance matrix	n_x \times n_x
K_k	Kalman gain matrix	n_x \times n_y
\hat{x}_k	Estimated state at time k	n_x \times 1
\hat{x}_{k\\|k-1}	Predicted state (before observing y_k)	n_x \times 1
\hat{x}_{k\\|k}	Filtered state (after observing y_k)	n_x \times 1

State Space Model

A Kalman filter operates on a linear state space model consisting of:

State equation (process model): x_k = Fx_{k-1} + Gu_{k-1} + w_{k-1}, \quad w_{k-1} \sim (0, Q)
Observation equation (measurement model): y_k = Hx_k + v_k, \quad v_k \sim (0, R)
Initial state: x_0 \sim (E[x_0], \text{Var}[x_0])

Example: Car position estimation

We want to estimate the current position x_k of a car using:

State equation (speedometer u_k):
- x_k = x_{k-1} + u_k \Delta t + w_k where w_k \sim (0, q)
- Here F=1, G=\Delta t, Q=q
Observation equation (GPS y_k):
- y_k = x_k + v_k where v_k \sim (0, r)
- Here H=1, R=r

To estimate true position, we integrate both distributions and weight by inverse variance (lower uncertainty → higher weight).

Prediction, Filtering, and Smoothing

Assuming we have observations up to time k-1:

Prediction: Forecast future states \hat{x}_{k+1|k}, \hat{x}_{k+2|k}, \ldots
Filtering: Correct the current state \hat{x}_{k|k}
Smoothing: Correct previous states \hat{x}_{k-1|k}, \hat{x}_{k-2|k}, \ldots

The Kalman filter (forward algorithm) performs prediction + filtering at each time step. The Kalman smoother (forward-backward algorithm) runs the Kalman filter forward, then corrects past states backward.

Kalman Filter Intuition

prediction (prior):
- system state x_{t | t-1}
- uncertainty P_{t | t-1}
correction (posterior), after observing y_t, corrects the state prediction and updates the uncertainty:
- prediction: x_{t | t}
- uncertainty: P_{t | t}

For the 1D case of the product of two Normal PDFs

prediction: N(\hat{x}_{t|t-1}, \sigma^2_{t|t-1})
observation: N(x_t, r)
correction: p(\hat{x}_{t|t}, \sigma^2_{t|t}) \propto N(\hat{x}_{t|t-1}, \sigma^2_{t|t-1})N(x_t, r)
- \hat{x}_{t|t} = (1 - K_t)\hat{x}_{t|t-1} + K_t y_t
- \sigma^2_{t|t} = (1 - K_t) \sigma^2_{t|t-1}

where K_t is the Kalman gain defined as K_t = \frac{\sigma^2_{t|t-1}}{\sigma^2_{t|t-1} + r}

Interpretation:

If \sigma^2_{k|k-1} is small (strong prediction), then K \approx 0: trust the model more
If r is small (accurate sensor), then K \approx 1: trust the measurement more
The Kalman gain automatically balances prediction vs. observation based on their relative uncertainties

Kalman Filter Algorithm

Initialize:
- \hat{x}_{0|0} = E[x_0]
- P_{0|0} = \text{Var}[x_0]

For k = 1, 2, 3, \ldots:

Prediction (time update):
- \hat{x}_{k|k-1} = F\hat{x}_{k-1|k-1} + Gu_{k-1} (predicted state)
- P_{k|k-1} = FP_{k-1|k-1}F^T + Q (prediction uncertainty)
Correction (measurement update):
- \tilde{y}_k = y_k - H\hat{x}_{k|k-1} (innovation or measurement residual)
- K_k = P_{k|k-1}H^T(HP_{k|k-1}H^T + R)^{-1} (Kalman gain)
- \hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k \tilde{y}_k (corrected state)
- P_{k|k} = (I - K_k H)P_{k|k-1} (corrected prediction uncertainty)

Kalman Smoother

Motivation

The Kalman filter gives the best estimate given observations up to the current time. However, for offline applications (e.g., analyzing recorded data), we have all observations available. The smoother uses future observations to refine past state estimates, reducing estimation error.

Filter: \hat{x}_{k|k} (uses y_1, \ldots, y_k)
Smoother: \hat{x}_{k|T} (uses y_1, \ldots, y_T where T > k)

The smoother always has lower or equal error variance: P_{k|T} \leq P_{k|k}.

Kalman Smoother Algorithm

For all k = 1, \ldots, T:

Run the Kalman filter forward to compute \hat{x}_{k|k}, P_{k|k}, P_{k|k-1}

For k = T-1, \ldots, 1:

J_k = P_{k|k}F^T P_{k+1|k}^{-1} (smoother gain)
\hat{x}_{k|T} = \hat{x}_{k|k} + J_k(\hat{x}_{k+1|T} - \hat{x}_{k+1|k})
P_{k|T} = P_{k|k} + J_k(P_{k+1|T} - P_{k+1|k})J_k^T

The smoother gain, analogous to Kalman gain but for backward information propagation. It measures how much correction from future states affects the current state estimate.

Sequential Kalman Filter

Applicable when the measurements are uncorrelated (R diagonal). Avoids inverting a matrix of size n_y \times n_y in the measurement step. This reduces computation and can improve numerical stability. Systems that do not have diagonal R can be diagonalized.

System Model

State equation (same as standard form):
- x_k = F_{k-1}x_{k-1} + G_{k-1}u_{k-1} + w_{k-1}
- w_k \sim (0, Q_k)
Observation equation (with diagonal R):
- y_k = H_k x_k + v_k
- v_k \sim (0, R_k)
- R_k = \text{diag}(R_{1k}, R_{2k}, \ldots, R_{nk})

Algorithm

Initialize
- \hat{x}_{0} = E[x_0]
- P_{0} = \text{Var}[x_0]
Prediction (time update):
- \hat{x}_k^- = F_{k-1}\hat{x}_{k-1} + G_{k-1}u_{k-1}
- P_k^- = F_{k-1}P_{k-1}F_{k-1}^T + Q_{k-1}
Correction (measurement update):
- Initialize local iteration:
  - \hat{x}_{0,k} = \hat{x}_k^-
  - P_{0,k} = P_k^-
- For i = 1, 2, \ldots, n observations:
  - H_i is the i-th row of matrix H
  - \tilde{y}_{i,k} = y_{i,k} - H_i \hat{x}_{i-1,k} (innovation or measurement residual)
  - K_{i} = \frac{P_{i-1,k} H_i^T}{H_i P_{i-1,k} H_i^T + R_{ii}} (scalar Kalman Gain)
  - \hat{x}_{i,k} = \hat{x}_{i-1,k} + K_{i}\tilde{y}_{i,k}
  - P_{i,k} = P_{i-1,k} - K_{i}H_i P_{i-1,k}
- \hat{x}_{k} = \hat{x}_{n,k}
- P_{k} = P_{n,k}

Extensions: Non-linear Variants

For systems with non-linear dynamics or non-Gaussian noise, the linear Kalman filter may not be optimal.

Extended Kalman Filter (EKF)

Linearizes non-linear transition and observation functions around the current estimate using first-order Taylor expansion (Jacobians). Nearly as efficient as standard KF but can diverge if non-linearity is strong.

Unscented Kalman Filter (UKF)

Uses a set of carefully chosen sample points (“sigma points”) to capture mean and covariance of non-linear transformations without linearization. More accurate than EKF for moderate non-linearities, slightly higher computational cost.

Particle Filter

Represents the state distribution as a collection of samples and updates via importance weighting. Handles arbitrary non-linear and non-Gaussian systems but requires many more samples (computationally expensive).

Learning Parameters

The parameters \Theta = \{F, G, H, Q, R, E[x_0], \text{Var}[x_0] \} are often unknown and must be estimated from data.

EM Algorithm

The Expectation-Maximization (EM) algorithm is suitable for this:

E-step: Run Kalman filter and smoother to estimate hidden states x_1, \ldots, x_T given current parameter estimates
M-step: Update parameters \Theta to maximize likelihood of observations
Repeat until convergence

Use EM when you have multiple observations and want principled parameter estimation. Tune manually usually for real-time systems, manual tuning of Q and R based on sensor specifications and process knowledge is faster.

Numerical Stability

The standard covariance update P_{k|k} = P_{k|k-1} - K_k H P_{k|k-1} involves subtraction and can lose symmetry and positive-definiteness due to rounding errors, especially in high-dimensional systems or long-running filters.

Use the Joseph form covariance update instead: P_{k|k} = (I - K_k H)P_{k|k-1}(I - K_k H)^T + K_k R K_k^T

This form:

Maintains symmetry and positive-definiteness numerically
Costs more computation (~3x) but ensures stability
Recommended when numerical precision is critical

When to Use

High-dimensional systems (n_x large)
Long-running filters (where numerical error accumulates)
Safety-critical applications
As a diagnostic: if standard form diverges, switch to Joseph form

State Observers

A state observer (state estimator) reconstructs unobserved internal states from input-output measurements. The Kalman filter is the optimal observer for linear systems with Gaussian noise, providing:

Estimates \hat{x}_k of hidden states x_k
Automatic weighting of model predictions vs. sensor measurements
Uncertainty quantification via covariance P_k

State observers are fundamental in control theory and are used to implement state-feedback controllers when full state measurement is unavailable.

References

http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/
Matlab video: https://www.youtube.com/watch?v=mwn8xhgNpFY
http://biorobotics.ri.cmu.edu/papers/sbp_papers/integrated3/kleeman_kalman_basics.pdf
Kalman Filtering with Applications in Finance: https://www.youtube.com/watch?v=R63dU5w_djQ