Skip to content

Conversation

@lbesnard
Copy link
Contributor

No description provided.

@leonardolaiolo
Copy link
Contributor

@lbesnard should we just leave this open until data-services is fixed? I don't think we can merge as the checks still fail

@evacougnon
Copy link
Contributor

@leonardolaiolo do we actually need this for SRS scatterometry in the new infra? We won't be creating manifest files, am I right?

If we don't need this bit of code, all of the satellites are listed in the prefect pipeline (https://github.com/aodn/dataflow-orchestration/blob/989a9e5597dbe8436fb5f2e566c36afee3434bd9/projects/ingestion/classify/srs/srs_surface_waves.py#L42)

@leonardolaiolo
Copy link
Contributor

We migrated this pipeline, but we did not enable it - saying that I have a task for this iteration to enable it. So yes, if there are changes here and @lbesnard wants them to prod we should migrate this to our repo as well.
@mhidas can you double check what I'm saying makes sense?

@evacougnon
Copy link
Contributor

yes, it hasn't been enabled yet but should be in the top list as it was impossible to enable new satellites in the pipeline config (aodndata) for most of the surface waves dataset collections (including the scatterometry). The facility is aware of the fact we were unable to update the legacy pipeline and they were happy to wait for us to fully migrate and enable the surface waves pipeline into the new infra when convenient for us.

FYI @mhidas and @bpasquer

@mhidas
Copy link
Contributor

mhidas commented Nov 12, 2025

@mhidas can you double check what I'm saying makes sense?

I think so... though I don't know the full story of this dataset or changes being made.

As @evacougnon mentioned, we don't need manifest files in the new framework (at least for now I don't think there will be a need for them). The scripts in data-services that are downloading files into temporary storage and creating manifest files (running as cron jobs) will be replaced by scheduled "pull" flows in Prefect. We haven't created any yet, but the general idea is they can get the files, put them in the processing bucket, and call ingest directly from there.

@leonardolaiolo
Copy link
Contributor

@evacougnon
here the PR to enable this pipeline, keep an eye on it 👍

@lbesnard
Copy link
Contributor Author

we don't need manifest files in the new framework

I'm wondering how the system will cope with 100K+ of files as is the case with this dataset. Is the processing going to take weeks for example?

@leonardolaiolo
Copy link
Contributor

leonardolaiolo commented Nov 12, 2025

I'm wondering how the system will cope with 100K+ of files as is the case with this dataset. Is the processing going to take weeks for example?

Are they coming all together? I think the batch handling will deal with these cases? @craigrose probably we need to keep an eye on this once we turn the switch?

@lbesnard
Copy link
Contributor Author

Are they coming all together?

yeap. they're currently scp'd. Could be way more than 100K files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants