-
Notifications
You must be signed in to change notification settings - Fork 22
Robustifying TVR #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robustifying TVR #177
Conversation
…ty in the robustdiff docstring was wrong and that there is a simpler way to write scaled Huber's throughout the repo, which incidentally also helps guard against divide by zeros
…ding integration constants
| import numpy as np | ||
| from warnings import warn | ||
| from scipy.linalg import expm, sqrtm | ||
| from scipy.stats import norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer called here, because huber_const moved
| proper :math:`\\ell_1` normalization. | ||
| Note that :code:`log_q` and :code:`proc_huberM` are coupled, as are :code:`log_r` and :code:`meas_huberM`, via the relation | ||
| :math:`\\text{Huber}(q^{-1/2}v, M) = q^{-1}\\text{Huber}(v, Mq^{-1/2})`, but these are still independent enough that for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just straight up wrong and caused me some confusion. But it's fixed now, and the math in the paper and in various code around here are modified to reflect a slightly simpler form. It's nice to just modify the
| (iterative_velocity, {'num_iterations':5, 'gamma':0.05}), (iterative_velocity, [5, 0.05]), | ||
| (smooth_acceleration, {'gamma':2, 'window_size':5}), (smooth_acceleration, [2, 5]), | ||
| (lineardiff, {'order':3, 'gamma':5, 'window_size':11, 'solver':'CLARABEL'}), (lineardiff, [3, 5, 11], {'solver':'CLARABEL'}) | ||
| (lineardiff, {'order':3, 'gamma':0.01, 'window_size':11, 'solver':'CLARABEL'}), (lineardiff, [3, 0.01, 11], {'solver':'CLARABEL'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we can get actually good-looking solutions with a different choice here.
| [(-25, -25), (3, 3), (0, 0), (3, 3)]], | ||
| iterated_second_order: [[(-7, -8), (-25, -25), (0, -1), (0, 0)], | ||
| [(-7, -8), (-14, -14), (0, -1), (0, 0)], | ||
| iterated_second_order: [[(-25, -25), (-25, -25), (0, -1), (0, 0)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of what I did to address the integration constant made some of these tests a lot more stable. The problem is that I test with a constant function and a linear function without noise for the first two here, and that was causing sigma to become 0. In the case of tiny sigma, the thing to do to find the integration constant is just to use the
This shrinks these errors down a bunch, and it makes the CI's error bounds agree with my local run's error bounds. 🎉
|
|
||
|
|
||
| def tvrdiff(x, dt, order, gamma, solver=None): | ||
| def tvrdiff(x, dt, order, gamma, huberM=float('inf'), solver=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a hyperparameter. Set its default so that TVR will still optimize the sum_squares loss unless told to do otherwise.
| sigma = stats.median_abs_deviation(u[s] - v[s], scale='normal') # M is in units of this robust scatter metric | ||
| if sigma < 1e-6: sigma = 1 # guard against divide by zero | ||
| return np.sqrt(2*np.mean(utility.huber((u[s] - v[s])/sigma, M))) * sigma | ||
| return np.sqrt(2*np.mean(utility.huber(u[s] - v[s], M*sigma))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized dividing the first arg by
| absx = np.abs(x) | ||
| return np.where(absx <= M, 0.5*x**2, M*(absx - 0.5*M)) | ||
|
|
||
| def huber_const(M): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lives in utilities now that's it's called from multiple functions.
| return np.median(x - x_hat) # Solves the L1 distance minimization | ||
| else: | ||
| sigma = median_abs_deviation(x - x_hat, scale='normal') # M is in units of this robust scatter metric | ||
| if sigma < 1e-6: sigma = 1 # guard against divide by zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was subtle: sigma was in fact coming out 0 a lot in the testing code, because there are noiseless situations where the function is linear that cause the MAD to be 0. To see this, consider inputs [0, 1, 2, 3, 4] and [1, 2, 3, 4, 5]. The difference is [1, 1, 1, 1, 1], which has a median of 1, absolute deviations of [0,0,0,0,0], of which the median is 0. Gross. I didn't realize.
What do? Well, if sigma is 0, there are no outliers, so why not just go ahead and use the L2 case?
Thankfully, I don't think this comes up at all in our experiments, because there is noise. I've run some local optimizations across the two branches and always get the same results. Yet it resolves the mysterious local/CI disagreement/fragility of the unit tests!
| sigma = median_abs_deviation(x - x_hat, scale='normal') # M is in units of this robust scatter metric | ||
| if sigma < 1e-6: sigma = 1 # guard against divide by zero | ||
| return minimize(lambda x0: np.mean(huber((x - (x_hat+x0))/sigma, M)), # fn to minimize in 1st argument | ||
| return minimize(lambda x0: np.sum(huber(x - (x_hat+x0), M*sigma)), # fn to minimize in 1st argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equivalent minimization problem. mean->sum, which scales the cost up by
pynumdiff/optimize/_optimize.py
Outdated
| 'order': {1, 2, 3}}, # warning: order 1 hacks the loss function when tvgamma is used, tends to win but is usually suboptimal choice in terms of true RMSE | ||
| {'gamma': (1e-4, 1e7)}), | ||
| 'order': {1, 2, 3}, # warning: order 1 hacks the loss function when tvgamma is used, tends to win but is usually suboptimal choice in terms of true RMSE | ||
| 'huberM': [0., 1, 2, 3, 6]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the Huber parameter input is on the scale of scatter of input data rather than residuals, HuberMs
|
Addressed #171 |
In the process moved huber_const to the utilities and realized my Huber scale identity in the robustdiff docstring was wrong and that there is a simpler way to write scaled Huber's throughout the repo (and in the paper, which I've just updated), which incidentally also helps guard against divide by zeros