-
Notifications
You must be signed in to change notification settings - Fork 3
Finishing the manipulations.py Rewrite #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
krzywon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. Locally, I am seeing two errors during testing related to zero division, but the ci succeeds, so likely a local environment issue. I think replacing the scipy requirement with something already included in the package would be good, but not a show-stopper.
Keeping the existing manipulations file is a good idea now that I've had a minute to think about it, but a deprecation message would be helpful. That is something that can be added later.
|
My fractional binning stuff definitely needs scipy
…On Wed, 27 Sept 2023, 19:13 Jeff Krzywon, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
On sasdata/data_util/new_manipulations.py
<#47 (comment)>:
Disregard. See my main note. Keeping this is good for backward
compatibility.
—
Reply to this email directly, view it on GitHub
<#47 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKU4SU644FHYZIDH4AOI43X4RUF7ANCNFSM6AAAAAA43YDHVE>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
|
Required for 6.0.0 |
|
@lucas-wilkins will take a look at it |
krzywon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready now, but I made changes, so I would suggest an alternate review prior to merging.
|
I've not checked the behaviour of the slicers, but I assume that is still all good. |
|
@smk78 will try to take a look |
|
Testing https://github.com/SasView/sasview/actions/runs/7079272736 on W10/x64: After loading 2D datasets (one .dat, one .h5) & doing Create New > Right-click > Slicers, I see:
But files (obviously) have integer numbers of bins.
More to follow... |
This tool is also persistent; ie, once it has appeared, if you change the slicer it is not cleared on the plot. Also, weirdly, the top 14% of the 2D image gets erased with this slicer!
But in the Slicer Parameters box they are called: And the plot labels are: I think some consistency of naming would be helpful. |
smk78
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have performed functionality review on W10/x64. Unfortunately, there are some issues. Please see comments on this PR.
Are you sure its not 18% |
|
Note to @krzywon - I'm working on this now, that's why everything is failing. Will finish fixing soon. |
I was going by the ticks on the axis! A seventh being ~14%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Health Improved
(1 files improve in Code Health)
Gates Failed
New code is healthy
(1 new file with code health below 9.00)
Gates Passed
2 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| New code is healthy | Violations | Code Health Impact | |
|---|---|---|---|
| averaging.py | 2 rules | 8.03 | Suppress |
View Improvements
| File | Code Health Impact | Categories Improved |
|---|---|---|
| manipulations.py | 3.81 → 8.55 | Lines of Code in a Single File, Low Cohesion, Complex Method, Bumpy Road Ahead, Overall Code Complexity, Deep, Nested Complexity, Excess Number of Function Arguments |
Quality Gate Profile: The Bare Minimum
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.


Description
This pull request covers the last part of my summer project: the
manipulations.pyrewrite. It also includes the new analytical unit tests, from branch44-new-unit-tests-for-slicer-averagers, except in this branch they point towards the new manipulations module instead of the old one.The details of the
manipulations.pyrewrite are as follows:DirectionalAverageclass for the bulk of their computation. This too reduces repeated code and increases maintainability.nbinsparameter now, whereas before some neededbin_widthinstead.The main functionality change is how
DirectionalAverageconsiders the weights of the datapoints. In both the new and oldmanipulations.pythese weights are either 0 or 1, though now it should be a lot easier to implement fractional weights. Previously, the weight assigned to a datapoint was calculated on-the-fly as part of a for loop which ran over each datapoint and was also responsible for part of the averaging process. The new methodology separates out the weighting and averaging processes, creating annxmweights matrix wherenis the number of bins the data is sorted into, andmis the number of datapoints. The averaging is then done using matrix multiplication and numpy functions. This new method will certainly be more memory heavy than the old method, and it could be slower for large datasets. My hope is however that the switch from python functions to numpy functions will mitigate this, and the improvement to the quality of results once fractional weights are added should more than justify the resource usage.Dependencies needed: scipy - needed for new unit tests.
Note that merging this pull request requires that a sister pull request in the sasview repo also be merged: SasView/sasview#2615
Fixes #44 (issue about dodgy unit tests).
How Has This Been Tested?
There are two ways this rewrite has been tested. The less rigorous method involved using this branch plus the sasview branch mentioned above to confirm that the plotted results look the same before and after using the same data file, slicer and slicer parameters. The more rigorous method involves the new unit tests. When the file
utest_averaging_analytical.pyfrom the branch44-new-unit-tests-for-slicer-averagersis run, you see that averagers from the previous version ofmanipulations.pyall pass these tests. The version ofutest_averaging_analytical.pyincluded in this branch tests the new averagers, and these also pass the tests. Note there are other minor changes made to the tests for compatibility with the new averagers.Work still to be done
At a superficial level, there are some files which need renaming. The new version of
manipulations.pygoes by the namenew_manipulations.py. Once everyone is satisfied with this new version, it should be renamed to replace the old one. There are references tonew_manipulationsinutest_averaging_analytical.pyon the sasdata side, as well as inPlotter2D.py,AnnulusSlicer.py,BoxSlicer.py,BoxSum.py,WedgeSlicer.py, andSectorSlicer.pyon the sasview side. These should also be changed accordingly.There is also some more serious work to do before this rewrite can truly be considered complete. For the most part we've already reached feature parity with the old
manipulations.py, but there are some blind spots which need acknowledging.manipulations.py, the_Sectorclass had the option to bin the data logarithmically. As far as I can tell this functionality was never called upon by sasview. There may be users who use it in custom scripts however, so it would still be worth implementing here. As a bonus, the feature would now be available to all slicers.manipulations.py,CircularAverageand_Sectorcall a function by the name ofget_dq_data. I don't quite understand the maths, but I get the impression from Lucas that it doesn't function as it should. The last thing needed to complete this rewrite would be proper consideration of thedqx_dataanddqy_dataproperties of the supplied Data2D object to compute thedxproperty of the returned Data1D object. The maths involved here is a little outside my comfort zone, so I'd be happy to delegate this responsibility to someone else.Review Checklist (please remove items if they don't apply):