Skip to content

Conversation

@MaelleTtrt
Copy link
Member

Correction of function reshape_timebin + associated tests

Copy link
Collaborator

@mathieudpnt mathieudpnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some ruff/syntax changes but overall lokks good to me !
changed a test so it's a bit more complete
let me know


def get_timezone(df: DataFrame) -> tzoffset | list[tzoffset]:
"""Return timezone(s) from DataFrame."""
"""Return timezone(s) from DataFrame.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return timezone(s) from APLOSE DataFrame.



def check_timestamp(df: DataFrame, timestamp_audio: list[Timestamp]) -> None:
"""Check if the variable timestamp_wav exists and is correctly formated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""Check if provided timestamp_audio list is correctly formated.

    Parameters
    ----------
    df: DataFrame
        APLOSE results Dataframe.
    timestamp_audio: list[Timestamp]
        list of start timestamps of corresponding audio file for each detection.

"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer your following suggestion

"""
timestamp_audio: list[Timestamp]
        A list of timestamps. Each timestamp is the start datetime of the
        corresponding audio file for each detection in df.
"""

----------
df: DataFrame
An APLOSE result DataFrame.
timestamp_audio: list[Timestamp]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
timestamp_audio: list[Timestamp]
        A list of timestamps. Each timestamp is the start datetime of the
        corresponding audio file for each detection in df.
"""

if isinstance(get_timezone(df), list):
df["start_datetime"] = [to_datetime(elem, utc=True) for elem in df["start_datetime"]]
df["end_datetime"] = [to_datetime(elem, utc=True) for elem in df["end_datetime"]]
df["start_datetime"] = [to_datetime(elem, utc=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  if isinstance(get_timezone(df), list):
      df["start_datetime"] = [
          to_datetime(elem, utc=True) for elem in df["start_datetime"]
      ]
      df["end_datetime"] = [
          to_datetime(elem, utc=True) for elem in df["end_datetime"]
      ]

)

return concat(results).sort_values(by=["start_datetime", "end_datetime", "annotator", "annotation"]).reset_index(drop=True)
return concat(results).sort_values(by=["start_datetime", "end_datetime",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (
        concat(results)
        .sort_values(by=["start_datetime", "end_datetime", "annotator", "annotation"])
        .reset_index(drop=True)
    )


def test_get_filename_timestamp(sample_df: DataFrame, sample_yaml: Path) -> None:
tz = get_timezone(sample_df)
with open(sample_yaml, "r") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff recommandation:

with sample_yaml.open(encoding="utf-8") as f:
    data_yaml = yaml.safe_load(f)

tz = get_timezone(sample_df)
with open(sample_yaml, "r") as f:
data_yaml = yaml.safe_load(f)
sample_key = list(data_yaml.keys())[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff

sample_key = next(iter(data_yaml.keys()))

check_names=False)


def test_check_timestamp_none(sample_df: DataFrame) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.mark.parametrize("timestamp", [None, []])
def test_check_timestamp_empty(sample_df: DataFrame, timestamp: list | None) -> None:
    with pytest.raises(ValueError, match="`timestamp_wav` is empty"):
        check_timestamp(sample_df, timestamp)

this will test for None and empty list. I had to change change_timestamp because it would not treat timestamp_audio = [] as an empty list

def check_timestamp(df: DataFrame, timestamp_audio: list[Timestamp]) -> None:
    """Check if provided `timestamp_audio` list is correctly formated.

    Parameters
    ----------
    df: DataFrame
        APLOSE results Dataframe.
    timestamp_audio: list[Timestamp]
        list of start timestamps of corresponding audio file for each detection.

    """
    if timestamp_audio in [None, []]:
        msg = "`timestamp_wav` is empty"
        raise ValueError(msg)
    if len(timestamp_audio) != len(df):
        msg = "`timestamp_wav` is not the same length as `df`"
        raise ValueError(msg)

PS: must add from __future__ import annotations to test_filtering_utils

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, you want me to replace the test test_check_timestamp_none by your test_check_timestamp_empty?

Comment on lines 204 to 209
def test_check_timestamp_wrong_length(sample_df: DataFrame) -> None:
len_sample_df = len(sample_df)+1
timestamps = [Timestamp("2025-01-01") + Timedelta(days=i) for i in range(len_sample_df)] # shorter list

with pytest.raises(ValueError, match="`timestamp_wav` is not the same length as `df`"):
check_timestamp(sample_df, timestamps)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff

def test_check_timestamp_wrong_length(sample_df: DataFrame) -> None:
    len_sample_df = len(sample_df) + 1
    timestamps = [
        Timestamp("2025-01-01") + Timedelta(days=i)
        for i in range(len_sample_df)
    ]  # shorter list
    with pytest.raises(ValueError, match="is not the same length as `df`"):
        check_timestamp(sample_df, timestamps)

assert all(df_out["end_time"] == 86400.0)
assert df_out["start_datetime"].min() >= sample_df["start_datetime"].min().floor("D")
assert df_out["end_datetime"].max() <= sample_df["end_datetime"].max().ceil("D")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing blank line

@MaelleTtrt MaelleTtrt force-pushed the correct_reshape_timebin branch from ac8c353 to 7fa5767 Compare December 5, 2025 11:28
@MaelleTtrt MaelleTtrt force-pushed the correct_reshape_timebin branch from 7fa5767 to 40c861d Compare December 16, 2025 14:02
@mathieudpnt mathieudpnt merged commit 0265250 into Project-OSmOSE:main Dec 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants