Summary
Workflow.orbit_dir defaults to Path("orbits") and is resolved via Path.resolve() against the process CWD. In containerized environments (Kubernetes pods, Docker containers) where CWD may be / or another read-only path, this resolves to /orbits, causing:
PermissionError: [Errno 13] Permission denied: '/orbits'
Related Upstream Issue
COMPASS: opera-adt/COMPASS#241
The root cause spans both sweets (which sets the default) and COMPASS (which creates the directory). A fix in either project would resolve the issue.
Reproduction
# On a Kubernetes worker pod (CWD=/)
from sweets.core import Workflow
from sweets.download import BurstSearch
wf = Workflow(work_dir="/tmp/insar/event", bbox=(...), search=BurstSearch(...))
print(wf.orbit_dir) # -> /orbits (non-writable)
wf.run()
# -> PermissionError: [Errno 13] Permission denied: '/orbits'
Environment
- sweets version: installed via
sweets-dask-noml-cpu:develop.latest container
- Platform: EASI (Kubernetes/EKS), Dask Gateway worker pods
- Observed on: Batch processing of 85 Australian earthquake events via Dask Gateway
Proposed Fix
Resolve orbit_dir relative to work_dir rather than CWD when no explicit path is provided:
# In Workflow model or __init__
if self.orbit_dir is None or self.orbit_dir == Path("orbits"):
self.orbit_dir = self.work_dir / "orbits"
Alternatively, support an environment variable override:
import os
self.orbit_dir = Path(os.environ.get("SWEETS_ORBIT_DIR", self.work_dir / "orbits"))
Workaround
os.chdir(work_dir) before creating the Workflow so that Path("orbits").resolve() points to {work_dir}/orbits. This is fragile in multi-task/concurrent environments (e.g., Dask workers processing multiple events).
Impact
5 out of 21 events that reached geocoding failed with this error during a batch run of 85 earthquake events on Kubernetes. The remaining 64 events were lost to scheduler timeout before reaching this stage.
Summary
Workflow.orbit_dirdefaults toPath("orbits")and is resolved viaPath.resolve()against the process CWD. In containerized environments (Kubernetes pods, Docker containers) where CWD may be/or another read-only path, this resolves to/orbits, causing:Related Upstream Issue
COMPASS: opera-adt/COMPASS#241
The root cause spans both sweets (which sets the default) and COMPASS (which creates the directory). A fix in either project would resolve the issue.
Reproduction
Environment
sweets-dask-noml-cpu:develop.latestcontainerProposed Fix
Resolve
orbit_dirrelative towork_dirrather than CWD when no explicit path is provided:Alternatively, support an environment variable override:
Workaround
os.chdir(work_dir)before creating the Workflow so thatPath("orbits").resolve()points to{work_dir}/orbits. This is fragile in multi-task/concurrent environments (e.g., Dask workers processing multiple events).Impact
5 out of 21 events that reached geocoding failed with this error during a batch run of 85 earthquake events on Kubernetes. The remaining 64 events were lost to scheduler timeout before reaching this stage.