Feature: User-Controlled STAC Collection IDs
Background
Currently, STAC collection IDs are auto-assigned by DpsStacItemGenerator using a
deterministic formula derived from the DPS job's .met.json metadata file:
{username}__{algorithm_name}__{algorithm_version}__{tag}
This value is slugified (special characters replaced) and then unconditionally written
into the collection field of every STAC item before publishing to the ingestor queue —
regardless of what collection the user's catalog.json specifies. Users have requested
the ability to control the collection ID so that outputs from related jobs and algorithm
runs can be organized into a single, meaningfully named collection.
This ticket proposes an initial implementation using admin-mediated collection creation,
designed to extend toward self-service and algorithm-level authorization.
How the current pipeline works
DpsStacItemGenerator (link) is triggered by S3 event notifications when a DPS job writes a
catalog.json to the output bucket. For each event:
- The DPS output prefix is extracted from the S3 key path using a timestamp pattern
- A
.met.json file is loaded from that prefix — this is the authoritative source of
job context, containing at minimum: username, algorithm_name,
algorithm_version, and tag
- A deterministic collection ID is constructed from those fields and slugified
- The
catalog.json is read via pystac; every item's collection field is
overwritten with the deterministic ID before publishing to the ingestor SNS topic
Some users are already setting the collection field in their STAC items, but the
current code silently overwrites it. This feature stops that overwrite and makes the
item-provided collection ID the primary routing mechanism.
Proposed Approach
Phase 1: Manual registry of collection ID/prefix + user allowed combinations
To start we will manually manage the list of users who are allowed to contribute to a collection via specific collection ids or a prefix with a wildcard. This will be deployed in the maap-eoapi Cloudformation stack.
Phase 2: Self-Service Collection Creation (future)
The Phase 1 design is structured so that self-service collection creation slots in
without changing the DpsStacItemGenerator authorization logic. The Lambda already
handles all authorization outcomes. Future work will mostly be managed by JPL in the MAAP Console where collection/user assignments will be tracked. We will add token-gated transactions endpoints for collections in the MAAP DPS STAc.
Algorithm Authoring Convention
The collection ID should be treated as a runtime parameter, not a hardcoded value
inside algorithm code. DPS supports arbitrary named input parameters, and algorithms
should declare a collection_id input parameter that is passed through to the STAC
item outputs at job runtime:
# Recommended pattern in algorithm code
def run(collection_id: str = None, **kwargs):
items = generate_stac_items(...)
for item in items:
if collection_id:
item.collection_id = collection_id
write_catalog(items)
When a user submits a DPS job, they can then pass their target collection ID as a job
input parameter without modifying the algorithm itself:
algorithm: my-flood-detector v1.2.0
inputs:
collection_id: jsmith--flood-catalog-2025
...
This convention should be documented as a best practice in the MAAP algorithm
authoring guide. Its benefits are:
- The same algorithm version can route to different collections (dev, staging,
production; personal vs. shared)
- Collection governance decisions are separated from algorithm logic — the algorithm
doesn't need to know or care about catalog organization
- Users who don't specify a
collection_id parameter get the deterministic fallback
behavior automatically, so the convention is opt-in and backward compatible
Algorithms that hardcode a collection ID in their output items will still work — the
authorization check applies regardless of how the collection ID got into the item —
but hardcoding is discouraged because it couples a specific catalog governance decision
to algorithm code that may be shared or reused by others.
Naming Rules
- Lowercase alphanumeric characters, hyphens, and underscores only
- 3–64 characters; no leading or trailing hyphens or underscores
- Case-insensitive uniqueness (
my-collection and My-Collection are the same)
- Reserved names blocked:
api, admin, system, search, conformance,
queryables, and any existing system collection patterns
Collection IDs are immutable after creation. The current deterministic ID formula
uses __ (double underscore) as a delimiter — user-specified IDs should avoid this
pattern to remain visually distinguishable from auto-assigned IDs.
Error Surfacing
DpsStacItemGenerator currently has no feedback channel back to the user after a DPS
job completes. Collection governance introduces new async failure modes — collection not
found, user not authorized, algorithm version not approved — that users need visibility
into. At minimum, the Lambda will emit structured CloudWatch log events for every
governance decision. A user-facing feedback mechanism (DPS job callback or ingestion
status dashboard) is a dependency that should be resolved before this feature ships.
Feature: User-Controlled STAC Collection IDs
Background
Currently, STAC collection IDs are auto-assigned by
DpsStacItemGeneratorusing adeterministic formula derived from the DPS job's
.met.jsonmetadata file:This value is slugified (special characters replaced) and then unconditionally written
into the
collectionfield of every STAC item before publishing to the ingestor queue —regardless of what collection the user's
catalog.jsonspecifies. Users have requestedthe ability to control the collection ID so that outputs from related jobs and algorithm
runs can be organized into a single, meaningfully named collection.
This ticket proposes an initial implementation using admin-mediated collection creation,
designed to extend toward self-service and algorithm-level authorization.
How the current pipeline works
DpsStacItemGenerator(link) is triggered by S3 event notifications when a DPS job writes acatalog.jsonto the output bucket. For each event:.met.jsonfile is loaded from that prefix — this is the authoritative source ofjob context, containing at minimum:
username,algorithm_name,algorithm_version, andtagcatalog.jsonis read via pystac; every item'scollectionfield isoverwritten with the deterministic ID before publishing to the ingestor SNS topic
Some users are already setting the
collectionfield in their STAC items, but thecurrent code silently overwrites it. This feature stops that overwrite and makes the
item-provided collection ID the primary routing mechanism.
Proposed Approach
Phase 1: Manual registry of collection ID/prefix + user allowed combinations
To start we will manually manage the list of users who are allowed to contribute to a collection via specific collection ids or a prefix with a wildcard. This will be deployed in the maap-eoapi Cloudformation stack.
Phase 2: Self-Service Collection Creation (future)
The Phase 1 design is structured so that self-service collection creation slots in
without changing the
DpsStacItemGeneratorauthorization logic. The Lambda alreadyhandles all authorization outcomes. Future work will mostly be managed by JPL in the MAAP Console where collection/user assignments will be tracked. We will add token-gated transactions endpoints for collections in the MAAP DPS STAc.
Algorithm Authoring Convention
The collection ID should be treated as a runtime parameter, not a hardcoded value
inside algorithm code. DPS supports arbitrary named input parameters, and algorithms
should declare a
collection_idinput parameter that is passed through to the STACitem outputs at job runtime:
When a user submits a DPS job, they can then pass their target collection ID as a job
input parameter without modifying the algorithm itself:
This convention should be documented as a best practice in the MAAP algorithm
authoring guide. Its benefits are:
production; personal vs. shared)
doesn't need to know or care about catalog organization
collection_idparameter get the deterministic fallbackbehavior automatically, so the convention is opt-in and backward compatible
Algorithms that hardcode a collection ID in their output items will still work — the
authorization check applies regardless of how the collection ID got into the item —
but hardcoding is discouraged because it couples a specific catalog governance decision
to algorithm code that may be shared or reused by others.
Naming Rules
my-collectionandMy-Collectionare the same)api,admin,system,search,conformance,queryables, and any existing system collection patternsCollection IDs are immutable after creation. The current deterministic ID formula
uses
__(double underscore) as a delimiter — user-specified IDs should avoid thispattern to remain visually distinguishable from auto-assigned IDs.
Error Surfacing
DpsStacItemGeneratorcurrently has no feedback channel back to the user after a DPSjob completes. Collection governance introduces new async failure modes — collection not
found, user not authorized, algorithm version not approved — that users need visibility
into. At minimum, the Lambda will emit structured CloudWatch log events for every
governance decision. A user-facing feedback mechanism (DPS job callback or ingestion
status dashboard) is a dependency that should be resolved before this feature ships.