Kalman Filter initialization

Some choices were made for this on in #98, but I'm moving the discussion for it out of #97.

The iteration used to start by considering the initial inputs to be from *measurement* space rather than *state* space, which makes some sense, since we take the first input to the filter from the data array (which is confusingly named `x` but becomes `y` to the filter). This means the initial input isn't actually $\hat{x}_0, P_0$

Tracing code: In the former `__constant_velocity__` (and in other places too), the first data point with zero derivatives was given as `x0`, with covariance `P0` = (10\*Identity matrix), but this was used as the *a priori* $\circ_{n|n-1}$ rather than as the *a posteriori* $\circ_{n-1|n-1}$. This means there's no forward prediction happening, so at the first time step the predicted measurement and actual measurement will have the same mean, and the covariance will become a warped combination of `R` and 10*(Identity matrix). This strikes me as slightly odd, because we probably want the covariance around the first measurement to be just `R`. I'm sure it all settles out in a matter of a few steps, but I'm considering whether there is a slightly better way to kick off the iteration.

<img src="https://github.com/user-attachments/assets/e76de8d9-eb00-4f70-b1e0-67933d8857c1" width=500/>

Because we only observe the signal, not its derivatives,  $C$ is $[1, 0, 0]^T$. If we take $x_0$ to be the *a posteriori* estimate, then you have $y_0 = CAx_0$, which is an underdetermined system. But due to the special structure of this case, it can be made to agree with the *a priori* situation above with a bit of common sense: $y_0 = x_0$ if `x0` is `[the first data point, 0, ... 0]`, because projecting this through the constant-derivative $A$ just returns the same `x0`.

We can attempt a similar thing with covariance, but then we have to solve the matrix problem $R = C(AP_0A^T + Q)C^T$, which involves convex optimization. Here's a script to solve it:

```python
import cvxpy as cp
import numpy as np

dt = 0.1
r = 1e-4
q = 1e-2

A = np.array([[1, dt], [0, 1]]) # states are x, x'. This basically does an Euler update.
C = np.array([[1, 0]]) # we measure only y = noisy x
R = np.array([[r]])
Q = np.array([[1e-16, 0], [0, q]]) # uncertainty is around the velocity

# A = np.array([[1, dt, (dt**2)/2], # states are x, x', x"
#               [0, 1, dt],
#               [0, 0,  1]])
# C = np.array([[1, 0, 0]]) # we measure only y = noisy x
# R = np.array([[r]])
# Q = np.array([[1e-16, 0, 0],
#               [0, 1e-16, 0],
#               [0,     0, q]]) # uncertainty is around the acceleration 

# Define symmetric matrix variable
P0 = cp.Variable(A.shape, PSD=True)

# Expression for the right-hand side
expr = C @ (A @ P0 @ A.T + Q) @ C.T

# Define objective: minimize Frobenius norm difference from R
objective = cp.Minimize(cp.norm(expr - R, 'fro'))

# Solve the problem
prob = cp.Problem(objective)
prob.solve()

# Access the solution
P0_opt = P0.value
print("P0_opt", P0_opt) # looks something like np.array([[r, r/10], [r/10, r/100]])

print("error with P0_opt", C @ (A @ P0_opt @ A.T + Q) @ C.T - R)
print("error with P0=I*10 solution", C @ (A @ np.eye(A.shape[0])*10 @ A.T + Q) @ C.T - R)
P0_best = np.zeros(A.shape); P0_best[0,0] = r
print("error with true optimal solution", C @ (A @ P0_best @ A.T + Q) @ C.T - R)
```

What we see printed from this is

```
P0_opt [[9.80287020e-05 9.80294627e-06]
 [9.80294627e-06 9.80308008e-07]]
error with P0_opt [[-9.05659832e-10]]
error with P0=I*10 solution [[10.0999]]
error with true optimal solution [[1.00004098e-16]]
```

If I vary `q`, nothing happens to the answer, because `C` is a selector that filters out the corresponding entry of `Q`, but modifying `r` does have an effect: $P_{0,\text{opt}}$ appears to be approximately `[[r, r/10],[r/10, r/100]]`. (The pattern holds for acceleration, where the answer is approximately `[[r, r/10, r/200],[r/10, r/100, r/2000], [r/200, r/2000, r/40000]]`.) Like the `x0` case, some common sense and algebra tells me that these non-`[0,0]` entries shouldn't really matter, because they get filtered out by multiplications with `C`, so I've also tried a zero matrix with a single `r` at its first location, which turns out to result in an even lower error. Both are significantly better than choosing the identity matrix as `P0`.

To follow up, I've written a little script to iterate $P$ forward using different values of $P_0$ and plot the covariance ellipses:

```python
from matplotlib import pyplot
fig, axes = pyplot.subplots(1, 3, figsize=(12, 4))

def next_P(P):
	P_ = (A @ P @ A.T + Q)
	K = P_ @ C.T @ np.linalg.inv(C @ P_ @ C.T + R)
	return (np.eye(A.shape[0]) - K @ C) @ P_

def plot_ellipse(P, axes, c):
	L, V = np.linalg.eig(P)
	t = np.linspace(0, 2*np.pi)
	xy = np.vstack((np.cos(t), np.sin(t)))
	ellipse = V * np.sqrt(L) @ xy
	axes.plot(ellipse[0], ellipse[1], c)

for i,P in enumerate([P0_opt, P0_best, np.eye(2)*10]):
	plot_ellipse(P, axes[i], 'k--')
	for c in ['C0--','C1--','C2--']:
		P = next_P(P)
		plot_ellipse(P, axes[i], c)
	axes[i].set_xlabel("pos")
	axes[i].set_ylabel("vel")

axes[0].set_title(r"$P_{0,opt}$")
axes[1].set_title(r"$P_{0,best}$")
axes[2].set_title(r"$P_0 = I*10$")
pyplot.tight_layout()
pyplot.show()
```

I see not much difference at all between the optimal value found by CVXPY and my own common sense answer, and the fact that covariance is 0 in all but the first dimension isn't actually a problem: They plump up to a nice healthy round at the next iteration. But these look quite different from the first few values of `P` in the case it's initialized with the identity matrix.

<img width="1187" alt="Image" src="https://github.com/user-attachments/assets/ce7f42a3-b290-4cba-a16d-6c804be6403b" />

Zooming in on the identity matrix case, we see 

<img width="369" alt="Image" src="https://github.com/user-attachments/assets/50f2cd47-36a9-4016-b606-908cd6e31651" />

which shows `P` approaching its asymptotic shape all the same, but by a very different route. (Since all the matrices in play here are invariant across steps, $K$ and $P$ converge to fixed values. In some cases, like embedded where computation is relatively more precious, people find the asymptotic matrices offline and then hard-code them.)

I think it's worth an experiment to see whether initializing $P_0$ this way is better on the test functions.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kalman Filter initialization #110

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Kalman Filter initialization #110

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions