Skip to content

Conversation

@pavelkomarov
Copy link
Collaborator

@pavelkomarov pavelkomarov commented Nov 20, 2025

This method got some attention in our discussion on #48 as possibly redundant, and as a dependency of slide_function, it makes #173 slightly harder to address. Thinking more about it, the original stated reason for having jerk_sliding—i.e. that the convex solver might struggle and be slow when there are too many points—isn't actually particularly addressed by this thing. For TVR, solve time is linear in the data sequence length, $N$, so giving a longer sequence won't really hurt. The problem is still strongly convex, so CVXPY will still use OSQP and converge fast. Rather, breaking the problem up into sections that overlap and then smoothly combining sections of the results takes significantly more computation, because we run the convex solver over any given datapoint several times.

What will change in using jerk vs jerk_sliding is that you won't get that blending between solutions. The kernel looks like a ramp up for 1/5 of the window, flat for 3/5, ramp down for 1/5, and the stride is 1/5 such that the overlap looks like:
IMG_4125

Does combining solutions smoothly like this provide any kind of benefit? I'm not sure, and it's possible, but my statistical intuition says "no", because this kind of local ensembling doesn't have access to any more information than the global algorithm. Especially given all the other methods' approximately-equal performance, I doubt these kinds of games could make the solution more accurate, but I didn't test it against everyone else in notebook 4, because this method is limited solely to order=3 for jerk currently, which is often not the optimal choice. The method could be extended to offer different kernel choices and 1st, 2nd, and 3rd order, but I question the value of torturing this thing with these kinds of manipulations when the core algorithm, tvrdiff, is natively able to just handle the whole series. This is not quite like polydiff or lineardiff where by necessity we have to break the problem up.

@pavelkomarov
Copy link
Collaborator Author

pavelkomarov commented Nov 20, 2025

Okay, I got curious for experimental results myself on this, so I've added an order parameter to jerk_sliding, so it's really more of a tvr_sliding, passing that parameter down where appropriate, and tinkered with the default optimization dictionaries:

jerk_sliding: ({'gamma': [1e-2, 1e-1, 1, 10, 100, 1000],
              'window_size': [3, 10, 30, 50, 90, 130],
                    'order':{1, 2, 3}},
                   {'gamma': (1e-4, 1e7),
              'window_size': (3, 1000)})

I've made a temporary, local amendment to notebook 4 to do the run. Results plots against everyone else to follow. Should take only an hour or two for just the one method.

@pavelkomarov
Copy link
Collaborator Author

A few hours later: Okay, scratch that runtime. It's taking significantly longer, because there is now an extra hyperparameter which means doing 3x as many Nelder-Meads, and there is a 5x slowdown from solving over the same data series locations in five different little convex optimizations. I managed to take apart, de-scale (even the boiler), and reassemble the whole espresso machine, and this run is still not even halfway done. It's really an argument for not doing this kind of sliding window approach.

@pavelkomarov
Copy link
Collaborator Author

pavelkomarov commented Nov 21, 2025

Here are the plots
1f3ff5d0-a107-4a63-b79b-b6a6df759ab0
dca97f6e-583e-4d76-8a6a-69be3f9c4e67
0f03d92b-b00a-4bc9-bbf2-6a68a95ea5ed
a6263d2c-32a3-4b43-863b-9ca0ea734342
2a300e0b-1a30-456c-b982-64bc30ecbdd7

Looking at results tables, the sliding TVR often finds exactly the same solution as the ordinary TVR, because they have the same RMSE and $R^2$. But sometimes it finds something lightly different. To quantify this: 57.2% of the 6*728 simulation-experiment pairs have TVR and Sliding results with both absolute RMSE diff and absolute $R^2$ diff within 0.01, and 25.0% have RMSE and $R^2$ diffs within 0.001. For comparison, the diffs-within-threshold portions for TVR compared with the next-most-closely related method, SmoothAccelTVR, are 13.7% and 1.8% for 0.01 and 0.001, respectively, and compared against RTSDiff the numbers fall to 2.6% and 0.2%. What this means at the visual level of the plots is that across all runs the $j$ tends to be at almost exactly the same height as the $\gamma$, usually with same-looking sample standard deviation bars, even. In a few cases the sliding window version does a touch worse, and in one case I spotted it doing a touch better than TVR, but it never stands out as markedly better than the population of other methods.

So I'm declaring this thing wholly redundant and merging this PR. I doubt anyone has been using it for anything, but if they are and get a failure in some old code, we can direct them to use tvrdiff with order=3.

@pavelkomarov pavelkomarov merged commit 21b0b5f into master Nov 21, 2025
2 checks passed
@pavelkomarov pavelkomarov deleted the nix-jerk-sliding branch November 21, 2025 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants