Skip to content

feat!: filter values from Layout folders#4

Merged
robin-cls merged 9 commits into
mainfrom
info_from_folders
May 5, 2026
Merged

feat!: filter values from Layout folders#4
robin-cls merged 9 commits into
mainfrom
info_from_folders

Conversation

@robin-cls
Copy link
Copy Markdown
Collaborator

This PR introduces the FilesDatabase.filter_values method and FilesDatabase.subsets property.

  • filter_values lists the unique values that can be passed to a filter. It tries to get this information from the folders if possible for a quick extraction. It falls back to a full scan of the files if the layouts (aka. the folders and files hierarchy description) are disabled or if the actual file system does not match the expected layouts. Warnings are emitted if the full scan is used to extract this information, except if the files are not organized (flat case, no folders), in which case we consider this is a nominal behavior.
  • subsets works on top of the previous method to list the combination of SubsetUnmixer.partition_keys that are present. This will help the user understand which datasets are mixed in the product.

There are two motivations for this PR:

  1. Help the user: having the possible values for a filter in the query, list_files, map methods will help building the queries
  2. It will also open the way for more specialized features, namely the HalfOrbitMixin which is supposed to give information about the half orbit range and half orbit holes.

The subsets property can also be reused to refactor how we handle the mandatory keys in the query methods.

Finally, a breaking change has been introduced: if a file does not match the file name convention, it was previously ignored. It now raises a LayoutMismatchError. This change in behavior is needed to raise an exception when the files are not organized in folders and we are trying to get the filter values. The chosen implementation will crop the existing layouts and remove the flat layout containing the file name convention only. This means we need to raise an error during the metadata collection if we are in the flat case, so that we can properly handle it and fall back to a full scan silently. The alternative would have been to configure the LayoutVisitor policy more finely, but it is also great to entice the user having 1 folder per product (datasets can be mixed, but not products).

This last point showed that the flat case should not be the nominal case, and it would be best if the files are organized in folders. @annesophie-cls This has an impact on the AVISO client which should keep the remote layout if possible. At least, the output folder should be different for each product.

@robin-cls robin-cls changed the title Info from folders feat!: filter values from Layout folders May 5, 2026
@robin-cls robin-cls merged commit ac8ab63 into main May 5, 2026
7 checks passed
@robin-cls robin-cls deleted the info_from_folders branch May 6, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant