Skip to content

Conversation

@Dominosauro
Copy link

@SimoneMartino98

The auto-filtering pipeline is a Python module designed to automatically apply multi-level Butterworth lowpass filtering to time-series data (such as molecular dynamics trajectories or experimental signals) to systematically remove high-frequency noise while preserving meaningful dynamics. It works by computing the Fast Fourier Transform (FFT) to analyze frequency content, intelligently selecting multiple cutoff frequencies biased toward low frequencies (since slow dynamics are often more physically relevant), and then applying filters at each cutoff level. The pipeline generates comprehensive diagnostic outputs including filtered signal arrays, FFT plots, KDE distributions, comparison visualizations, and evolution videos that show how signals change across filter levels, enabling users to identify the optimal filtering strength for their specific analysis needs without manual trial-and-error.

Copy link
Collaborator

@SimoneMartino98 SimoneMartino98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, first iteration of changes which contains the main changes.
I'm neglecting all that regards the AutoFilInsight because we will speak about it after this first revision.

Taken into account the complexity of this PR i would say that this is a pretty good starting point, but there is something to do before the merging. In addition at the end of this it is required to:

  • make pytests.
  • improve the documentation.
  • add example files.

from dynsight.trajectory import Insight

# Type alias for 64-bit float numpy arrays
ArrayF64: TypeAlias = NDArray[np.float64]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the native version of the typings without using aliases.

A good flow during the code reading is better than going up and down to see what is TypeAlias.

import imageio.v2 as imageio
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns # type: ignore[import-untyped]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this ignore?

What's the error behind?

MIN_FRAMES_TO_DROP = 2 # Minimum frames needed to drop first frame

# Initialize logger for this module
logger = logging.getLogger(__name__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the new built-in dynsight logger:

from dynsight.logs import logger

logger = logging.getLogger(__name__)


# --------------------------- Result container ---------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

# --------------------------- Helper Functions ---------------------------


def _resolve_dataset_path(user_path: str | os.PathLike[str]) -> Path:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You input os.PathLike[str] and output Path.

For paths, always use Path and never os

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you convert it into a Path, then is a useless step.

directly start from Path

levels: int = 50,
out_dir: str | Path | None = None,
reuse_existing: bool = True,
frames_to_remove: int = DEFAULT_FRAMES_TO_REMOVE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put 0 here and let the user optimize the parameter.

self.reuse_existing = reuse_existing

# Validate parameters and compute derived values
self.dt, self.fs = self._validate_params()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check_and_unify() is more descriptive ?

ds_path = _resolve_dataset_path(path)
signals = _load_array_any(ds_path)

if signals.ndim != NDIM_EXPECTED:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example on how to remove this GLOBAL:

ndim_expected = 2
if signals.ndim != ndim_expected:
    msg = f"Expected 2D array (series x frames), got {signals.shape}"
    raise ValueError(msg)

It is not necessary to be global.

return np.asarray(ins.dataset)


def _make_dir_safe(directory: Path) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One-line function is simply useless (and it generates more lines of code rather than simplify it); just use:

directory.mkdir(parents=True, exist_ok=True)

where you need it

pipeline.save_fft_plots()

if save_cutoff_folders:
pipeline.save_cutoff_folders()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants