Skip to content

Improve typing consistency in codebase#1311

Open
ubdbra001 wants to merge 29 commits intoAFM-SPM:mainfrom
ubdbra001:i1299-improve-typing-consistency
Open

Improve typing consistency in codebase#1311
ubdbra001 wants to merge 29 commits intoAFM-SPM:mainfrom
ubdbra001:i1299-improve-typing-consistency

Conversation

@ubdbra001
Copy link
Copy Markdown
Collaborator

Fix #1299

This PR addresses the inconsistent typing in the codebase.
Given the size of this I'm going to do it incrementally in this PR. The initial stage of this is picking out the "easy wins" e.g. cases where the type hints are missing or incorrect and are easy/obvious to update.

At this stage all the tests still pass.


Before submitting a Pull Request please check the following.

  • Existing tests pass.
  • Documentation has been updated and builds. Remember to update as required...
    • docs/configuration.md
    • docs/usage.md
    • docs/data_dictionary.md
    • docs/advanced.md and new pages it should link to.
  • Pre-commit checks pass.

@ubdbra001 ubdbra001 changed the title I1299 improve typing consistency Improve typing consistency in codebase Mar 5, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 89.36170% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.97%. Comparing base (b72a0c2) to head (9328755).
⚠️ Report is 515 commits behind head on main.

Files with missing lines Patch % Lines
topostats/processing.py 33.33% 4 Missing ⚠️
topostats/classes.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1311      +/-   ##
==========================================
- Coverage   89.25%   87.97%   -1.29%     
==========================================
  Files          30       31       +1     
  Lines        5810     6047     +237     
==========================================
+ Hits         5186     5320     +134     
- Misses        624      727     +103     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work tackling this behemoth of a task @ubdbra001 and apologies for it being so messy (typing only came onto my radar way after I'd started).

Some comments in-line from a quick scan, don't have time to investigate more deeply I'm afraid.

Returns
-------
np.array
npt.ArrayLike
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not come across npt.ArrayLike before so had to look it up.

In this instance the function return np.asarray(filtered_arr1) and so whilst it could be considered as something that could be converted to an array it is already an array so I wonder if npt.NDArray is more appropriate as we (should) know what the dtype should be although it's currently missing.

Disclaimer typing is not one of my strong points!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a post about it that seems to opine that npt.NDArray is marginally better.

npt.ArrayLike seems to be for objects that can be cast to arrays, including arrays.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npt.NDArray sounds more sensible, and I suspect I've used it elsewhere, so I'll stick to that one. Thanks!

Comment thread topostats/tracing/ordered_tracing.py Outdated
Comment thread topostats/processing.py
Comment thread topostats/processing.py
A dictionary with keys 'image', 'img_path' and 'pixel_to_nm_scaling' containing a file or frames' image, it's
path and it's pixel to namometre scaling value.
topostats_object : Topostats
TopoStats object - Needs further info
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about...

Suggested change
TopoStats object - Needs further info
An object of type ``TopoStats`` class.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, I was wondering if there should be a little more, e.g. run_curvature_stats has:

        ``TopoStats`` object post splining, all ``Molecules`` within the ``grain_crops`` attribute (a dictionary of
        ``GrainCrop`` should have ``splined_coords`` attributes populated.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, typically these will be newly instantiated but there is nothing stopping an existing TopoStats (e.g. loaded from disk / .topostats file) object being passed in here

An object of type ``TopoStats`` class with a minimum of ``image_original``, ``filename`` and ``pixel_to_nm_scaling`` attributes which allow filtering to be run.

I think those are the bare minimum but could be wrong.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add those, and we can update it in future if required.

Comment thread topostats/processing.py
grain_stats_df.index.set_names(["grain_number", "class", "subgrain"], inplace=True)
else:
grain_stats_df = None
return topostats_object.filename, topostats_object, grain_stats_df
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this work but I thought I'd removed returning of dataframes as they are instead pulled out of the topostats_objects and collated into a dictionary before converting to pd.DataFrame.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only one I've spotted so far, happy to open an issue to be addressed later

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note it as an issue, its at least then recorded and can be addressed if anyone has time/inclination.

Copy link
Copy Markdown
Collaborator

@SylviaWhittle SylviaWhittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thing I noticed skim-reading this.

Comment thread topostats/processing.py Outdated
Comment thread topostats/processing.py Outdated
Comment thread topostats/processing.py Outdated
Returns
-------
tuple[str, pd.DataFrame]
tuple[str, pd.DataFrame] - Deprecated, needs updating
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas, Ty sees that filename property can be None despite (I think) a TopoStats object without filename should not be possible?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently, according to the types specified in the TopoStats class, filename is optional, and so it can be None: filename: str | None = None
Which is why Ty complains.

I've had a go at changing this already, but if I remember correctly it caused a bunch of tests to fail, and I didn't want to deal with that quite yet. That'll be the next round of updates (when I have got as much of the low handing fruit as possible)

ubdbra001 and others added 3 commits March 5, 2026 18:49
Adds correct return typehint

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
Add correct return type hint

Co-authored-by: Sylvia Whittle <86117496+SylviaWhittle@users.noreply.github.com>
@@ -214,7 +214,7 @@ def compile_images(
@staticmethod
def remove_common_values(
ordered_array: npt.NDArray, common_value_check_array: npt.NDArray, retain: list = ()
Copy link
Copy Markdown
Collaborator Author

@ubdbra001 ubdbra001 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this:

  1. retain is typed as list type but being assigned an empty tuple by default 🤔
  2. Setting the default value to an empty list may cause issues (see here).

I'll probably set it as None and add a check so that it get re-set to an empty list if the arg is not supplied.
Any objections?

Comment thread topostats/classes.py
molecule_data : dict[int, Molecule], optional
Dictionary of ``Molecule`` objects indexed by molecule number.
tracing_stats : dict | None
tracing_stats : dict, optional
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been going around replacing | None with , optional in the docstrings. Probably doesn't have a real impact but I thought I'd make it all consistent.

Let me know if you'd prefer to retain | None

Comment thread topostats/classes.py
Dictionary, indexed by molecule where the value is the molecules statistics for the given molecule.
"""
if self.molecule_data is None:
raise ValueError("No molecule data found")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you'd like me to change this to something else

Correct to a tuple of str and dict
Corrects the type hint and the docstring
Copy link
Copy Markdown
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some responses in-line...

Comment thread topostats/processing.py
A dictionary with keys 'image', 'img_path' and 'pixel_to_nm_scaling' containing a file or frames' image, it's
path and it's pixel to namometre scaling value.
topostats_object : Topostats
TopoStats object - Needs further info
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, typically these will be newly instantiated but there is nothing stopping an existing TopoStats (e.g. loaded from disk / .topostats file) object being passed in here

An object of type ``TopoStats`` class with a minimum of ``image_original``, ``filename`` and ``pixel_to_nm_scaling`` attributes which allow filtering to be run.

I think those are the bare minimum but could be wrong.

Comment thread topostats/processing.py
grain_stats_df.index.set_names(["grain_number", "class", "subgrain"], inplace=True)
else:
grain_stats_df = None
return topostats_object.filename, topostats_object, grain_stats_df
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note it as an issue, its at least then recorded and can be addressed if anyone has time/inclination.

@ns-rse
Copy link
Copy Markdown
Collaborator

ns-rse commented Mar 6, 2026

Just a thought but it might be worth adding commits to .git-blame-ignore-revs so that the "blame" resides with the original author rather than yourself @ubdbra001.

pre-commit-ci bot and others added 9 commits March 17, 2026 12:49
Docstring for second value in return tuple needs to be updated
These specify that the values in the dict are not None after processing
However, I don' see these being used anywhere, so it may be worth just deleting?
To match return value type hint
Currently very broad value type, I couldn't parse what they could be from the code
@ubdbra001 ubdbra001 marked this pull request as ready for review April 1, 2026 11:10
@ubdbra001
Copy link
Copy Markdown
Collaborator Author

I think I've got all the low hanging typing fruit, anything else this will cause tests to fail and so I suspect will require a bit more discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve typing consistency in Topostats

3 participants