Skip to content

Kalman Filter initialization #110

@pavelkomarov

Description

@pavelkomarov

Some choices were made for this on in #98, but I'm moving the discussion for it out of #97.

The iteration used to start by considering the initial inputs to be from measurement space rather than state space, which makes some sense, since we take the first input to the filter from the data array (which is confusingly named x but becomes y to the filter). This means the initial input isn't actually $\hat{x}_0, P_0$

Tracing code: In the former __constant_velocity__ (and in other places too), the first data point with zero derivatives was given as x0, with covariance P0 = (10*Identity matrix), but this was used as the a priori $\circ_{n|n-1}$ rather than as the a posteriori $\circ_{n-1|n-1}$. This means there's no forward prediction happening, so at the first time step the predicted measurement and actual measurement will have the same mean, and the covariance will become a warped combination of R and 10*(Identity matrix). This strikes me as slightly odd, because we probably want the covariance around the first measurement to be just R. I'm sure it all settles out in a matter of a few steps, but I'm considering whether there is a slightly better way to kick off the iteration.

Because we only observe the signal, not its derivatives, $C$ is $[1, 0, 0]^T$. If we take $x_0$ to be the a posteriori estimate, then you have $y_0 = CAx_0$, which is an underdetermined system. But due to the special structure of this case, it can be made to agree with the a priori situation above with a bit of common sense: $y_0 = x_0$ if x0 is [the first data point, 0, ... 0], because projecting this through the constant-derivative $A$ just returns the same x0.

We can attempt a similar thing with covariance, but then we have to solve the matrix problem $R = C(AP_0A^T + Q)C^T$, which involves convex optimization. Here's a script to solve it:

import cvxpy as cp
import numpy as np

dt = 0.1
r = 1e-4
q = 1e-2

A = np.array([[1, dt], [0, 1]]) # states are x, x'. This basically does an Euler update.
C = np.array([[1, 0]]) # we measure only y = noisy x
R = np.array([[r]])
Q = np.array([[1e-16, 0], [0, q]]) # uncertainty is around the velocity

# A = np.array([[1, dt, (dt**2)/2], # states are x, x', x"
#               [0, 1, dt],
#               [0, 0,  1]])
# C = np.array([[1, 0, 0]]) # we measure only y = noisy x
# R = np.array([[r]])
# Q = np.array([[1e-16, 0, 0],
#               [0, 1e-16, 0],
#               [0,     0, q]]) # uncertainty is around the acceleration 

# Define symmetric matrix variable
P0 = cp.Variable(A.shape, PSD=True)

# Expression for the right-hand side
expr = C @ (A @ P0 @ A.T + Q) @ C.T

# Define objective: minimize Frobenius norm difference from R
objective = cp.Minimize(cp.norm(expr - R, 'fro'))

# Solve the problem
prob = cp.Problem(objective)
prob.solve()

# Access the solution
P0_opt = P0.value
print("P0_opt", P0_opt) # looks something like np.array([[r, r/10], [r/10, r/100]])

print("error with P0_opt", C @ (A @ P0_opt @ A.T + Q) @ C.T - R)
print("error with P0=I*10 solution", C @ (A @ np.eye(A.shape[0])*10 @ A.T + Q) @ C.T - R)
P0_best = np.zeros(A.shape); P0_best[0,0] = r
print("error with true optimal solution", C @ (A @ P0_best @ A.T + Q) @ C.T - R)

What we see printed from this is

P0_opt [[9.80287020e-05 9.80294627e-06]
 [9.80294627e-06 9.80308008e-07]]
error with P0_opt [[-9.05659832e-10]]
error with P0=I*10 solution [[10.0999]]
error with true optimal solution [[1.00004098e-16]]

If I vary q, nothing happens to the answer, because C is a selector that filters out the corresponding entry of Q, but modifying r does have an effect: $P_{0,\text{opt}}$ appears to be approximately [[r, r/10],[r/10, r/100]]. (The pattern holds for acceleration, where the answer is approximately [[r, r/10, r/200],[r/10, r/100, r/2000], [r/200, r/2000, r/40000]].) Like the x0 case, some common sense and algebra tells me that these non-[0,0] entries shouldn't really matter, because they get filtered out by multiplications with C, so I've also tried a zero matrix with a single r at its first location, which turns out to result in an even lower error. Both are significantly better than choosing the identity matrix as P0.

To follow up, I've written a little script to iterate $P$ forward using different values of $P_0$ and plot the covariance ellipses:

from matplotlib import pyplot
fig, axes = pyplot.subplots(1, 3, figsize=(12, 4))

def next_P(P):
	P_ = (A @ P @ A.T + Q)
	K = P_ @ C.T @ np.linalg.inv(C @ P_ @ C.T + R)
	return (np.eye(A.shape[0]) - K @ C) @ P_

def plot_ellipse(P, axes, c):
	L, V = np.linalg.eig(P)
	t = np.linspace(0, 2*np.pi)
	xy = np.vstack((np.cos(t), np.sin(t)))
	ellipse = V * np.sqrt(L) @ xy
	axes.plot(ellipse[0], ellipse[1], c)

for i,P in enumerate([P0_opt, P0_best, np.eye(2)*10]):
	plot_ellipse(P, axes[i], 'k--')
	for c in ['C0--','C1--','C2--']:
		P = next_P(P)
		plot_ellipse(P, axes[i], c)
	axes[i].set_xlabel("pos")
	axes[i].set_ylabel("vel")

axes[0].set_title(r"$P_{0,opt}$")
axes[1].set_title(r"$P_{0,best}$")
axes[2].set_title(r"$P_0 = I*10$")
pyplot.tight_layout()
pyplot.show()

I see not much difference at all between the optimal value found by CVXPY and my own common sense answer, and the fact that covariance is 0 in all but the first dimension isn't actually a problem: They plump up to a nice healthy round at the next iteration. But these look quite different from the first few values of P in the case it's initialized with the identity matrix.

Image

Zooming in on the identity matrix case, we see

Image

which shows P approaching its asymptotic shape all the same, but by a very different route. (Since all the matrices in play here are invariant across steps, $K$ and $P$ converge to fixed values. In some cases, like embedded where computation is relatively more precious, people find the asymptotic matrices offline and then hard-code them.)

I think it's worth an experiment to see whether initializing $P_0$ this way is better on the test functions.

Metadata

Metadata

Assignees

Labels

researchwhen a task requires some experimentation or diving into papers and math

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions