Skip to content

Conversation

@Gautzilla
Copy link
Contributor

@Gautzilla Gautzilla commented Nov 25, 2025

🐳 What's new?

This PR includes a default behaviour if files are not timestamped.

🐳 How does it work?

Basically, if the strptime_format argument of a code api dataset from_folder() method is set to None, the first valid file will be set as beginning at the new first_file_begin timestamp (which has a default value), and each following valid file will begin at the end of the previous one.

🐳 Some more stuff

To make this thing work with the example audio data provided by Naturalis, I had some tiny stuff in this PR:

  • MP3 files are now accepted by the API
  • AudioData.get_value() now returns a 2D array even for mono files. This helps with consistency across files.
  • SpectroData values only take into account the first channel of multichannel audio files (we might dig deeper into that later)

@Gautzilla Gautzilla self-assigned this Nov 25, 2025
@Gautzilla Gautzilla requested a review from cazaudo December 9, 2025 15:18
@Gautzilla
Copy link
Contributor Author

Hey!

I think the PR is ready, I just added some documentation on the behaviour with non-timestamped audio files.

Can anyone take a look at this? If we can merge it quickly, I'll publish a pre-release version of OSEkit for us to share to Naturalis.

Copy link
Member

@cazaudo cazaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked perfectly well on my tests, I have just not examined how you addressed the need for installing new libraries to make this branch work (conda install liibsndfile) ?

Also, your corresponding example in doc is nice , i was wondering whether your audio files have different durations or not ? i think it should be highlighted as a file specificity in the example because these non timestamped files will be mostly found in large heterogeneous datasets with a great diversity of file durations (at least this is the case with Naturalis)

so i think in the doc i would have started with a simpler more basic example , ie just computing a spectrogram on each entire file whatever their duration

but this can be kept for a future PR , i approve changes already

@Gautzilla
Copy link
Contributor Author

Worked perfectly well on my tests, I have just not examined how you addressed the need for installing new libraries to make this branch work (conda install liibsndfile) ?

These libraries are not python-related: The python soundfile module we use is supported through the cross-platform libsndfile C library: we updated soundfile, but having an up-to-date libsndfile installed is up to the user. That's why we install it with conda (which manages both python and non-python libraries in its venvs) and do not mention it in the project dependencies (which are python dependencies).

Also, your corresponding example in doc is nice , i was wondering whether your audio files have different durations or not ? i think it should be highlighted as a file specificity in the example because these non timestamped files will be mostly found in large heterogeneous datasets with a great diversity of file durations (at least this is the case with Naturalis)

so i think in the doc i would have started with a simpler more basic example , ie just computing a spectrogram on each entire file whatever their duration

but this can be kept for a future PR , i approve changes already

Good point. As of now, I simply duplicated the timestamped example audio files (which all last 10s indeed). I agree having a more heterogeneous example dataset with all types of formats and different durations would feel closer to what users will encounter on real projects!

@Gautzilla Gautzilla merged commit a9c2480 into Project-OSmOSE:main Dec 11, 2025
2 checks passed
@Gautzilla Gautzilla deleted the no-strptime-format branch December 11, 2025 09:22
@cazaudo
Copy link
Member

cazaudo commented Dec 11, 2025

thanks @Gautzilla , and when testing your branch I was wondering whether it would not be useful to have the installation procedure with conda described in the doc installation part, including how it works with uv , it did not seem very clear on this PR

conda install uv
uv pip install -e .

i think it will be useful for our team using datarmor to have this easily findable , but also for external users with moderate technical skills ; having several installation procedures with detailed description should facilitate accessibility to our codes

@Gautzilla
Copy link
Contributor Author

thanks @Gautzilla , and when testing your branch I was wondering whether it would not be useful to have the installation procedure with conda described in the doc installation part, including how it works with uv , it did not seem very clear on this PR

conda install uv
uv pip install -e .

i think it will be useful for our team using datarmor to have this easily findable , but also for external users with moderate technical skills ; having several installation procedures with detailed description should facilitate accessibility to our codes

Why not, it couldn't hurt anyways!
Note that uv is optional if the aim is just to have an editable installation of osekit in an existing venv (I use it within conda solely because it is faster than raw pip), so it's more an advice on using uv because I like it than a true necessity for using osekit!

But if the aim is to participate in OSEkit's development, then I push uv in the docs because syncing the dev venvs is essential to avoid problems!

@cazaudo
Copy link
Member

cazaudo commented Dec 11, 2025

actually in the legacy version i liked the fact that we had two sections in the install proceedure , one for user only and the other for user+dev (although it is already quite obvious with pip vs git , but the more explicit the better! )

typically your note above on uv should be made public not to scare beginner devs , i guess uv is not yet very used among PAM users

i will propose an update of the doc in this direction you will tell me what you think

@Gautzilla
Copy link
Contributor Author

typically your note above on uv should be made public not to scare beginner devs , i guess uv is not yet very used among PAM users

I don't even mention uv in the "basic user" installation guide for OSEkit:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants