Skip to content

Conversation

@pavelkomarov
Copy link
Collaborator

In the process moved huber_const to the utilities and realized my Huber scale identity in the robustdiff docstring was wrong and that there is a simpler way to write scaled Huber's throughout the repo (and in the paper, which I've just updated), which incidentally also helps guard against divide by zeros

…ty in the robustdiff docstring was wrong and that there is a simpler way to write scaled Huber's throughout the repo, which incidentally also helps guard against divide by zeros
import numpy as np
from warnings import warn
from scipy.linalg import expm, sqrtm
from scipy.stats import norm
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer called here, because huber_const moved

proper :math:`\\ell_1` normalization.
Note that :code:`log_q` and :code:`proc_huberM` are coupled, as are :code:`log_r` and :code:`meas_huberM`, via the relation
:math:`\\text{Huber}(q^{-1/2}v, M) = q^{-1}\\text{Huber}(v, Mq^{-1/2})`, but these are still independent enough that for
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just straight up wrong and caused me some confusion. But it's fixed now, and the math in the paper and in various code around here are modified to reflect a slightly simpler form. It's nice to just modify the $M$ instead of divide inputs and scale the whole loss function.

(iterative_velocity, {'num_iterations':5, 'gamma':0.05}), (iterative_velocity, [5, 0.05]),
(smooth_acceleration, {'gamma':2, 'window_size':5}), (smooth_acceleration, [2, 5]),
(lineardiff, {'order':3, 'gamma':5, 'window_size':11, 'solver':'CLARABEL'}), (lineardiff, [3, 5, 11], {'solver':'CLARABEL'})
(lineardiff, {'order':3, 'gamma':0.01, 'window_size':11, 'solver':'CLARABEL'}), (lineardiff, [3, 0.01, 11], {'solver':'CLARABEL'})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we can get actually good-looking solutions with a different choice here.

[(-25, -25), (3, 3), (0, 0), (3, 3)]],
iterated_second_order: [[(-7, -8), (-25, -25), (0, -1), (0, 0)],
[(-7, -8), (-14, -14), (0, -1), (0, 0)],
iterated_second_order: [[(-25, -25), (-25, -25), (0, -1), (0, 0)],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of what I did to address the integration constant made some of these tests a lot more stable. The problem is that I test with a constant function and a linear function without noise for the first two here, and that was causing sigma to become 0. In the case of tiny sigma, the thing to do to find the integration constant is just to use the $\ell_2$ loss, because it's optimal when there are no outliers.

This shrinks these errors down a bunch, and it makes the CI's error bounds agree with my local run's error bounds. 🎉



def tvrdiff(x, dt, order, gamma, solver=None):
def tvrdiff(x, dt, order, gamma, huberM=float('inf'), solver=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a hyperparameter. Set its default so that TVR will still optimize the sum_squares loss unless told to do otherwise.

sigma = stats.median_abs_deviation(u[s] - v[s], scale='normal') # M is in units of this robust scatter metric
if sigma < 1e-6: sigma = 1 # guard against divide by zero
return np.sqrt(2*np.mean(utility.huber((u[s] - v[s])/sigma, M))) * sigma
return np.sqrt(2*np.mean(utility.huber(u[s] - v[s], M*sigma)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized dividing the first arg by $\sigma$ and scaling the whole Huber function by $\sigma^2$ (such that a single $\sigma$ ends up outside the $\sqrt{\cdot}$) is equivalent to scaling up $M$ by $\sigma$, and the latter is mildly easier to write down and interpret, so I've updated the paper with it as well.

absx = np.abs(x)
return np.where(absx <= M, 0.5*x**2, M*(absx - 0.5*M))

def huber_const(M):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lives in utilities now that's it's called from multiple functions.

return np.median(x - x_hat) # Solves the L1 distance minimization
else:
sigma = median_abs_deviation(x - x_hat, scale='normal') # M is in units of this robust scatter metric
if sigma < 1e-6: sigma = 1 # guard against divide by zero
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was subtle: sigma was in fact coming out 0 a lot in the testing code, because there are noiseless situations where the function is linear that cause the MAD to be 0. To see this, consider inputs [0, 1, 2, 3, 4] and [1, 2, 3, 4, 5]. The difference is [1, 1, 1, 1, 1], which has a median of 1, absolute deviations of [0,0,0,0,0], of which the median is 0. Gross. I didn't realize.

What do? Well, if sigma is 0, there are no outliers, so why not just go ahead and use the L2 case?

Thankfully, I don't think this comes up at all in our experiments, because there is noise. I've run some local optimizations across the two branches and always get the same results. Yet it resolves the mysterious local/CI disagreement/fragility of the unit tests!

sigma = median_abs_deviation(x - x_hat, scale='normal') # M is in units of this robust scatter metric
if sigma < 1e-6: sigma = 1 # guard against divide by zero
return minimize(lambda x0: np.mean(huber((x - (x_hat+x0))/sigma, M)), # fn to minimize in 1st argument
return minimize(lambda x0: np.sum(huber(x - (x_hat+x0), M*sigma)), # fn to minimize in 1st argument
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent minimization problem. mean->sum, which scales the cost up by $N$, and division in one parameter->multiplication in the other, which scales the cost function by the constant $\frac{1}{\sigma^2}$. Has same argmin.

'order': {1, 2, 3}}, # warning: order 1 hacks the loss function when tvgamma is used, tends to win but is usually suboptimal choice in terms of true RMSE
{'gamma': (1e-4, 1e7)}),
'order': {1, 2, 3}, # warning: order 1 hacks the loss function when tvgamma is used, tends to win but is usually suboptimal choice in terms of true RMSE
'huberM': [0., 1, 2, 3, 6]},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Huber parameter input is on the scale of scatter of input data rather than residuals, $\sigma$ is likely to be larger than it would be if we were able to subtract $y-x$, so it's necessary to more finely comb lower values here than for other HuberMs

@pavelkomarov pavelkomarov merged commit ff79a12 into master Nov 22, 2025
1 of 2 checks passed
@pavelkomarov pavelkomarov deleted the robustify-tvr branch November 22, 2025 00:02
@pavelkomarov
Copy link
Collaborator Author

Addressed #171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants