Auto Filtering Code #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

Dominosauro wants to merge 10 commits into GMPavanLab:main from Dominosauro:main

Dominosauro commented Nov 27, 2025

@SimoneMartino98

The auto-filtering pipeline is a Python module designed to automatically apply multi-level Butterworth lowpass filtering to time-series data (such as molecular dynamics trajectories or experimental signals) to systematically remove high-frequency noise while preserving meaningful dynamics. It works by computing the Fast Fourier Transform (FFT) to analyze frequency content, intelligently selecting multiple cutoff frequencies biased toward low frequencies (since slow dynamics are often more physically relevant), and then applying filters at each cutoff level. The pipeline generates comprehensive diagnostic outputs including filtered signal arrays, FFT plots, KDE distributions, comparison visualizations, and evolution videos that show how signals change across filter levels, enabling users to identify the optimal filtering strength for their specific analysis needs without manual trial-and-error.

Dominosauro and others added 10 commits

October 7, 2025 11:39


          Initialized function auto_filtering.

d3b98fd


          Full auto_filtering code.

c1b7887


          Added overrides.

1fe6ea4


          Formatting.

8a7f470


          Added dependencies.

3b4178a


          Added dependencies.

d5ff3d8


          Adjusted typo.

ebd2941


          Added AutoFiltering code.

3ea92c8


          Format

aac9580


          Merge branch 'main' into main

38891f8

SimoneMartino98 requested changes

View reviewed changes

Collaborator

SimoneMartino98 left a comment

Ok, first iteration of changes which contains the main changes.
I'm neglecting all that regards the AutoFilInsight because we will speak about it after this first revision.

Taken into account the complexity of this PR i would say that this is a pretty good starting point, but there is something to do before the merging. In addition at the end of this it is required to:

make pytests.
improve the documentation.
add example files.

src/dynsight/_internal/data_processing/auto_filtering.py

+              from dynsight.trajectory import Insight
+              # Type alias for 64-bit float numpy arrays
+              ArrayF64: TypeAlias = NDArray[np.float64]

Collaborator

SimoneMartino98 Nov 27, 2025

Let's keep the native version of the typings without using aliases.

A good flow during the code reading is better than going up and down to see what is TypeAlias.

src/dynsight/_internal/data_processing/auto_filtering.py

+              import imageio.v2 as imageio
+              import matplotlib.pyplot as plt
+              import numpy as np
+              import seaborn as sns  # type: ignore[import-untyped]

Collaborator

SimoneMartino98 Nov 27, 2025

Why this ignore?

What's the error behind?

src/dynsight/_internal/data_processing/auto_filtering.py

+              MIN_FRAMES_TO_DROP = 2  # Minimum frames needed to drop first frame
+              # Initialize logger for this module
+              logger = logging.getLogger(__name__)

Collaborator

SimoneMartino98 Nov 27, 2025

Use the new built-in dynsight logger:

from dynsight.logs import logger

src/dynsight/_internal/data_processing/auto_filtering.py

		logger = logging.getLogger(__name__)


		# --------------------------- Result container ---------------------------

Collaborator

SimoneMartino98 Nov 27, 2025

remove

src/dynsight/_internal/data_processing/auto_filtering.py

		# --------------------------- Helper Functions ---------------------------


		def _resolve_dataset_path(user_path: str \| os.PathLike[str]) -> Path:

Collaborator

SimoneMartino98 Nov 27, 2025

You input os.PathLike[str] and output Path.

For paths, always use Path and never os

Collaborator

SimoneMartino98 Nov 27, 2025

After you convert it into a Path, then is a useless step.

directly start from Path

src/dynsight/_internal/data_processing/auto_filtering.py

+                      levels: int = 50,
+                      out_dir: str | Path | None = None,
+                      reuse_existing: bool = True,
+                      frames_to_remove: int = DEFAULT_FRAMES_TO_REMOVE,

Collaborator

SimoneMartino98 Nov 28, 2025

put 0 here and let the user optimize the parameter.

src/dynsight/_internal/data_processing/auto_filtering.py

+                      self.reuse_existing = reuse_existing
+                      # Validate parameters and compute derived values
+                      self.dt, self.fs = self._validate_params()

Collaborator

SimoneMartino98 Nov 28, 2025

maybe check_and_unify() is more descriptive ?

src/dynsight/_internal/data_processing/auto_filtering.py

+                          ds_path = _resolve_dataset_path(path)
+                          signals = _load_array_any(ds_path)
+                      if signals.ndim != NDIM_EXPECTED:

Collaborator

SimoneMartino98 Nov 28, 2025

An example on how to remove this GLOBAL:

ndim_expected = 2
if signals.ndim != ndim_expected:
    msg = f"Expected 2D array (series x frames), got {signals.shape}"
    raise ValueError(msg)

It is not necessary to be global.

src/dynsight/_internal/data_processing/auto_filtering.py

		return np.asarray(ins.dataset)


		def _make_dir_safe(directory: Path) -> None:

Collaborator

SimoneMartino98 Nov 28, 2025

One-line function is simply useless (and it generates more lines of code rather than simplify it); just use:

directory.mkdir(parents=True, exist_ok=True)

where you need it

src/dynsight/_internal/data_processing/auto_filtering.py

+                      pipeline.save_fft_plots()
+                  if save_cutoff_folders:
+                      pipeline.save_cutoff_folders()

Collaborator

SimoneMartino98 Nov 28, 2025

remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet