-
Notifications
You must be signed in to change notification settings - Fork 9
Auto Filtering Code #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Auto Filtering Code #142
Conversation
SimoneMartino98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, first iteration of changes which contains the main changes.
I'm neglecting all that regards the AutoFilInsight because we will speak about it after this first revision.
Taken into account the complexity of this PR i would say that this is a pretty good starting point, but there is something to do before the merging. In addition at the end of this it is required to:
- make
pytests. - improve the documentation.
- add example files.
| from dynsight.trajectory import Insight | ||
|
|
||
| # Type alias for 64-bit float numpy arrays | ||
| ArrayF64: TypeAlias = NDArray[np.float64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the native version of the typings without using aliases.
A good flow during the code reading is better than going up and down to see what is TypeAlias.
| import imageio.v2 as imageio | ||
| import matplotlib.pyplot as plt | ||
| import numpy as np | ||
| import seaborn as sns # type: ignore[import-untyped] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this ignore?
What's the error behind?
| MIN_FRAMES_TO_DROP = 2 # Minimum frames needed to drop first frame | ||
|
|
||
| # Initialize logger for this module | ||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the new built-in dynsight logger:
from dynsight.logs import logger| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| # --------------------------- Result container --------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| # --------------------------- Helper Functions --------------------------- | ||
|
|
||
|
|
||
| def _resolve_dataset_path(user_path: str | os.PathLike[str]) -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You input os.PathLike[str] and output Path.
For paths, always use Path and never os
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you convert it into a Path, then is a useless step.
directly start from Path
| levels: int = 50, | ||
| out_dir: str | Path | None = None, | ||
| reuse_existing: bool = True, | ||
| frames_to_remove: int = DEFAULT_FRAMES_TO_REMOVE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put 0 here and let the user optimize the parameter.
| self.reuse_existing = reuse_existing | ||
|
|
||
| # Validate parameters and compute derived values | ||
| self.dt, self.fs = self._validate_params() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check_and_unify() is more descriptive ?
| ds_path = _resolve_dataset_path(path) | ||
| signals = _load_array_any(ds_path) | ||
|
|
||
| if signals.ndim != NDIM_EXPECTED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example on how to remove this GLOBAL:
ndim_expected = 2
if signals.ndim != ndim_expected:
msg = f"Expected 2D array (series x frames), got {signals.shape}"
raise ValueError(msg)It is not necessary to be global.
| return np.asarray(ins.dataset) | ||
|
|
||
|
|
||
| def _make_dir_safe(directory: Path) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One-line function is simply useless (and it generates more lines of code rather than simplify it); just use:
directory.mkdir(parents=True, exist_ok=True)
where you need it
| pipeline.save_fft_plots() | ||
|
|
||
| if save_cutoff_folders: | ||
| pipeline.save_cutoff_folders() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
@SimoneMartino98
The auto-filtering pipeline is a Python module designed to automatically apply multi-level Butterworth lowpass filtering to time-series data (such as molecular dynamics trajectories or experimental signals) to systematically remove high-frequency noise while preserving meaningful dynamics. It works by computing the Fast Fourier Transform (FFT) to analyze frequency content, intelligently selecting multiple cutoff frequencies biased toward low frequencies (since slow dynamics are often more physically relevant), and then applying filters at each cutoff level. The pipeline generates comprehensive diagnostic outputs including filtered signal arrays, FFT plots, KDE distributions, comparison visualizations, and evolution videos that show how signals change across filter levels, enabling users to identify the optimal filtering strength for their specific analysis needs without manual trial-and-error.