Skip to content

Add override mechanism for algorithm containers, transfer explicit .sif images with HTCondor#464

Open
jhiemstrawisc wants to merge 5 commits intoReed-CompBio:mainfrom
jhiemstrawisc:explicit-sif-transfer
Open

Add override mechanism for algorithm containers, transfer explicit .sif images with HTCondor#464
jhiemstrawisc wants to merge 5 commits intoReed-CompBio:mainfrom
jhiemstrawisc:explicit-sif-transfer

Conversation

@jhiemstrawisc
Copy link
Collaborator

This PR aims to accomplish a few things:

  1. It adds a new valid keys to config.yaml that let a user say "For reconstruction algorithm foo, I want you to use container image bar". In the general case, I'm not sure how much this is needed, but @agitter and I have floated the concept in the past, and it helps me solve 2. This implements a 4-tier hierarchy over how much of the image name is overridden (with special logic for .sif extensions), e.g.:
containers:
  registry:
    base_url: docker.io
    owner: reedcompbio
    
    images:
      omicsintegrator1: "oi1:latest"
      omicsintegrator2: "jhiemstra/oi2:latest
      mincostflow: "hub.opensciencegrid.org/jhiemstra/mincostflow:latest
      allpairs: "images/allpairs.sif"

would result in container URIs of

  • omicsintegrator1 --> docker.io/reedcompbio/oi1:latest
  • omicsintegrator2 --> docker.io/jhiemstra/oi2:latest1
  • mincostflow --> hub.opensciencegrid.org/jhiemstra/mincostflow:latest
  • allpairs --> images/allpairs.sif (local file)

The caveat here is that I don't believe all container registries follow this kind of <base_url>/<owner>/<container> hierarchy, but it's at least true for docker, ghcr and hub.opensciencegrid.org. If a user finds themselves somewhere outside those, they can always declare the entire URI explicitly.

  1. When a container override contains the .sif extension, the file is added to the reconstruct rule's htcondor_transfer_input_files resource key. This triggers HTCondor to transfer the sif image as part of the job's input sandbox.
  2. When a .sif override is present and apptainer/singularity is the configured container framework, SPRAS uses the local file instead of pulling/building from a remote registry. This is combined with 2) are the key fixes for Provide guidance on working around docker rate limiting for large CHTC runs #462
  3. Some of the container resolution logic started to get sufficiently nested with if/else logic that I wanted to split it out into a helper function I could more easily test. This is where most of the new test code comes from.
  4. Finally, I fixed a small-but-inconsequential bug I noticed in the way we treated singularity and apptainer container frameworks differently in some cases. I added a helper function that makes it easier to treat these as synonyms.

Note that I haven't yet documented this guidance, as #462 requests -- I'd rather get this over the finish line first, then add the documentation to #459 (so I don't create conflicts for myself)

jhiemstrawisc and others added 5 commits March 11, 2026 17:42
…th HTCondor

This accomplishes two main things:
1. Users can explicitly state what container they want a given PRM
to run in via the configuration file, using the PRM name (as defined
in the config file) as the key.
2. When users specify an override `.sif` image, that image is added
to an HTCondor transfer list such that Condor moves the file to the
EP for execution (to avoid pulling during the job). Explicitly moving
required input files is a "best practice" in HTCondor, because failure
to resolve inputs at runtime squanders capacity.

When no override is provided or the HTCondor Snakemake executor isn't
available, the new Snakefile resource rule should be a no-op.

In addition to adding the features, I tried to split up some other
functions in and around container resolution to make them more testable.
This adds a more robust way to check whether the container framekwork
is apptainer/singularity, which for the purposes of our codebase should
be treated as synonyms.

I decided to do this after noticing an issue in test logs where the container
framework was set to apptainer, and `unpack_singularity` was true -- the
unpacking behavior happened correctly despite a logged warning claiming it
wouldn't happen because the warning only checked for singularity.

I believe this diff makes that type of mistake a little harder.
As I started writing the PR message, I realized things weren't quite the way I wanted them to be w.r.t. this hierarchy. Thisshould fix it.
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make sense to me overall.

#
# Local .sif file path (e.g., "images/pathlinker_v2.sif"):
# Apptainer/Singularity only. Skips pulling from registry and uses the
# pre-built .sif directly. When running via HTCondor with shared-fs-usage: none,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because shared-fs-usage: none isn't in this config file, it could help to state the place where is it set (the spras_profile config).

Comment on lines +60 to +65
# Example (one of each type):
# images:
# omicsintegrator1: "images/omics-integrator-1_v2.sif" # local .sif (Apptainer only)
# pathlinker: "pathlinker:v1234" # image name only (base_url/owner prepended)
# omicsintegrator2: "some-other-owner/oi2:latest" # owner/image (base_url prepended)
# mincostflow: "ghcr.io/reed-compbio/mincostflow:v2" # full registry reference (used as-is)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax makes sense to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further consideration, would it make more sense to nest these image overrides under each algorithm below? They the key would be image instead of , which may be less typo prone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but to me it felt more appropriate that the container overrides be defined under the container section of configuration. It could just as easily go the other way if you disagree.

)
else:
print(f'Container image override (local .sif): {image_override}', flush=True)
elif image_override:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this block needs to be more robust to malformed overrides. What if I provide hello.world or a/b/c/d/e/f? Do we want to pass that through until there is an error later?


run_container(CONTAINER_SUFFIX, DUMMY_COMMAND, DUMMY_VOLUMES, DUMMY_WORKDIR, DUMMY_OUTDIR, settings)
container_arg = mock_singularity.call_args[0][0]
# The actual .sif is used inside run_container_singularity; run_container itself
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to test that the actual .sif is used there?

@tristan-f-r tristan-f-r added the enhancement New feature or request label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants