Skip to content

Implement _get_file_stats utility for consistent file metadata retrieval#1253

Open
varma1221 wants to merge 3 commits intomalariagen:masterfrom
varma1221:add-get-file-stats
Open

Implement _get_file_stats utility for consistent file metadata retrieval#1253
varma1221 wants to merge 3 commits intomalariagen:masterfrom
varma1221:add-get-file-stats

Conversation

@varma1221
Copy link
Copy Markdown

What problem does this solve?
Currently, retrieving basic file metadata (like size or protocol) requires repeating _init_filesystem and fsspec boilerplate across different modules. This leads to inconsistent handling of return types especially for the protocol attribute, which some backends return as a string and others as a list and lacks centralized validation for mandatory fields like file size.

How does it solve it?
This PR introduces a private utility _get_file_stats in util.py. It:

  • Standardizes the retrieval of size, mtime, and protocol into a single dictionary.
  • Normalizes fs.protocol into a single string regardless of the backend implementation.
  • Implements strict validation on file size, raising a ValueError instead of returning None, which prevents hard-to-debug crashes in downstream mathematical
    operations.
  • Simplifies I/O logic for future modules by providing a clean, protocol-agnostic metadata interface.

Relevant issue numbers
Part of ongoing improvements to internal I/O utilities.

Testing done

  • Added comprehensive unit tests in tests/test_util.py.
  • Verified local file metadata retrieval (size, mtime, path).
  • Verified mock behavior for remote protocols (GCS/S3) and protocol normalization.
  • Verified that missing size metadata correctly triggers a ValueError.
  • Confirmed that standard FileNotFoundError propagates correctly.
  • Ran mypy and pre-commit to ensure type safety and code quality.

Breaking changes
None. This is an internal utility addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant