Add waveletdiff: wavelet-based denoising differentiator#202
Add waveletdiff: wavelet-based denoising differentiator#202mariaprot wants to merge 6 commits intoflorisvb:masterfrom
Conversation
- Implements Donoho & Johnstone (1994) universal threshold estimator - Vectorised thresholding over multi-column inputs - Supports variable step size via np.gradient - Supports arbitrary axis via ascontiguousarray + moveaxis - Adds pywt dependency
|
The build is failing in continuous integration (CI). If you click through to the GitHub actions and look at the run, the error is that |
pavelkomarov
left a comment
There was a problem hiding this comment.
There are some possible simplifications, and I think the use of np.gradient isn't what we want.
| finitediff: [(0, -1), (1, -1)], | ||
| <<<<<<< HEAD | ||
| savgoldiff: [(0, -1), (1, 1)], | ||
| waveletdiff: [(1, 0), (2, 1)], |
There was a problem hiding this comment.
These bounds are maybe reasonable but could maybe be tightened. I'd temporarily set multidim_methods_and_params = [just the wavelet row] and run python3 -m pytest -s pynumdiff/tests/test_diff_methods.py::test_multidimensionality --bounds --plot and have a look. It's possible you can get better bounds and a better-looking fit by playing with the hyperparameters you pass to waveletdiff in the multidim_methods_and_params. Use any better settings you can manage to find in the test, and tighten the bounds to match.
| N = x.shape[axis] | ||
|
|
||
| # Axis normalisation — bring target axis to front. | ||
| # Skip moveaxis when axis is already 0 to avoid an unnecessary allocation. |
There was a problem hiding this comment.
I'm reasonably sure moveaxis doesn't allocate anything, just changes some metadata so the same underlying data is viewed in a different order, so a moveaxis call when the array is already dimension 0 doesn't really cost anything significant.
| # When we do move, call ascontiguousarray immediately so the subsequent | ||
| # reshape is guaranteed zero-copy. | ||
| if axis == 0: | ||
| x_work = x if x.flags['C_CONTIGUOUS'] else np.ascontiguousarray(x) |
There was a problem hiding this comment.
What is the contiguous flag? I've not used it before. I think we might be able to assume input arrays are contiguous.
| differentiates with finite differences via np.gradient. | ||
|
|
||
| :param np.array x: data to differentiate | ||
| :param float or array dt_or_t: scalar dt or array of sample times. If an |
There was a problem hiding this comment.
I'm a bit confused by this. I thought the wavelet transform requires uniform spacing. How is variable spacing supported? Passing to np.gradient seems like it's just doing a second-order finite difference that accounts for nonuniform spacing (by solving little 3x3 inverse problems for every point). That shouldn't be a fall-through here, because there's the finitediff method offered separately in this library. Let's enforce the use of wavelets, efficiently via the wavelet transform, which means this method should only take dt, never a vector t.
| col_coeffs = [coeffs_denoised[i][:, col] for i in range(n_levels)] | ||
| x_hat_col = pywt.waverec(col_coeffs, wavelet, mode=mode)[:N] | ||
| x_hat_flat[:, col] = x_hat_col | ||
| dxdt_hat_flat[:, col] = np.gradient(x_hat_col, grad_arg) |
There was a problem hiding this comment.
We shouldn't need to use np.gradient for this. Because wavelets are analytic and we can compose a function by linearly summing contributions from wavelets, we should be able to construct the derivative by adding together analytic derivatives and then sampling.
pavelkomarov
left a comment
There was a problem hiding this comment.
Looking better! I still want to get away from using np.gradient, and there's still an error bound I question the tightness of in the multidim test, but other comments are few and minor.
| differentiates analytically by applying derivative reconstruction filters to | ||
| the denoised wavelet coefficients. | ||
|
|
||
| Because the DWT requires uniform spacing, this method only accepts a scalar |
There was a problem hiding this comment.
Nice. Informative mention.
| threshold=1.0 is the classical Donoho & Johnstone universal threshold | ||
| and is the recommended starting point. Values < 1.0 give less smoothing; | ||
| values > 1.0 give more aggressive smoothing. This parameter maps onto | ||
| tvgamma in the pynumdiff.optimize framework. |
There was a problem hiding this comment.
It also corresponds to s in splines. I don't think we need to correlate to other parameters in other contexts here, though. That's just adding info the user doesn't need. A smoothness parameter is what it is. I'd remove the allusion to tvgamma.
| col_coeffs = [coeffs_denoised[i][:, col] for i in range(n_levels)] | ||
| x_hat_col = pywt.waverec(col_coeffs, wavelet, mode=mode)[:N] | ||
| x_hat_flat[:, col] = x_hat_col | ||
| dxdt_hat_flat[:, col] = np.gradient(x_hat_col, dt) |
There was a problem hiding this comment.
Is there no way to get the derivative analytically from the coefficients? Each wavelet has an analytic expression, and the reconstruction is a linear sum of their contributions, so you can move the linear derivative operator inside the sum and find the derivative by summing the analytic derivatives of all the wavelets. I'd prefer this to using np.gradient, since it's using a finite difference scheme, which introduces its own little error and doesn't take full advantage of the mathematical structure. Is there any kind of easy library call to do this? Or maybe it has to be done manually?
|
|
||
| # Bring axis of differentiation to front so each column of x_flat is one | ||
| # signal to differentiate. moveaxis returns a view with updated strides, | ||
| # so ascontiguousarray ensures the subsequent reshape is zero-copy. |
There was a problem hiding this comment.
A reshape already doesn't create a copy, but an ascontiguousarray call might create one. Update this guy.
Addressing #164