Feature: User-Controlled Collection IDs for DPS STAC

# Feature: User-Controlled STAC Collection IDs
 
## Background
 
Currently, STAC collection IDs are auto-assigned by `DpsStacItemGenerator` using a
deterministic formula derived from the DPS job's `.met.json` metadata file:
 
```
{username}__{algorithm_name}__{algorithm_version}__{tag}
```
 
This value is slugified (special characters replaced) and then unconditionally written
into the `collection` field of every STAC item before publishing to the ingestor queue —
regardless of what collection the user's `catalog.json` specifies. Users have requested
the ability to control the collection ID so that outputs from related jobs and algorithm
runs can be organized into a single, meaningfully named collection.
 
This ticket proposes an initial implementation using admin-mediated collection creation,
designed to extend toward self-service and algorithm-level authorization.
 
---
 
## How the current pipeline works
 
`DpsStacItemGenerator` ([link](https://github.com/MAAP-Project/maap-eoapi/blob/6b76ac3fe65a3cce827e3320b46c310ff7a2df35/cdk/constructs/DpsStacItemGenerator/runtime/src/dps_stac_item_generator/item.py)) is triggered by S3 event notifications when a DPS job writes a
`catalog.json` to the output bucket. For each event:
 
1. The DPS output prefix is extracted from the S3 key path using a timestamp pattern
2. A `.met.json` file is loaded from that prefix — this is the authoritative source of
   job context, containing at minimum: `username`, `algorithm_name`,
   `algorithm_version`, and `tag`
3. A deterministic collection ID is constructed from those fields and slugified
4. The `catalog.json` is read via pystac; every item's `collection` field is
   **overwritten** with the deterministic ID before publishing to the ingestor SNS topic
 
Some users are already setting the `collection` field in their STAC items, but the
current code silently overwrites it. This feature stops that overwrite and makes the
item-provided collection ID the primary routing mechanism.
 
---
 
## Proposed Approach

### Phase 1: Manual registry of collection ID/prefix + user allowed combinations

To start we will manually manage the list of users who are allowed to contribute to a collection via specific collection ids or a prefix with a wildcard. This will be deployed in the maap-eoapi Cloudformation stack.

---
 
### Phase 2: Self-Service Collection Creation (future)
 
The Phase 1 design is structured so that self-service collection creation slots in
without changing the `DpsStacItemGenerator` authorization logic. The Lambda already
handles all authorization outcomes. Future work will mostly be managed by JPL in the MAAP Console where collection/user assignments will be tracked. We will add token-gated transactions endpoints for collections in the MAAP DPS STAc.
 
---
 
## Algorithm Authoring Convention
 
The collection ID should be treated as a runtime parameter, not a hardcoded value
inside algorithm code. DPS supports arbitrary named input parameters, and algorithms
should declare a `collection_id` input parameter that is passed through to the STAC
item outputs at job runtime:
 
```python
# Recommended pattern in algorithm code
def run(collection_id: str = None, **kwargs):
    items = generate_stac_items(...)
    for item in items:
        if collection_id:
            item.collection_id = collection_id
    write_catalog(items)
```
 
When a user submits a DPS job, they can then pass their target collection ID as a job
input parameter without modifying the algorithm itself:
 
```
algorithm: my-flood-detector v1.2.0
inputs:
  collection_id: jsmith--flood-catalog-2025
  ...
```
 
This convention should be documented as a best practice in the MAAP algorithm
authoring guide. Its benefits are:
 
- The same algorithm version can route to different collections (dev, staging,
  production; personal vs. shared)
- Collection governance decisions are separated from algorithm logic — the algorithm
  doesn't need to know or care about catalog organization
- Users who don't specify a `collection_id` parameter get the deterministic fallback
  behavior automatically, so the convention is opt-in and backward compatible
 
Algorithms that hardcode a collection ID in their output items will still work — the
authorization check applies regardless of how the collection ID got into the item —
but hardcoding is discouraged because it couples a specific catalog governance decision
to algorithm code that may be shared or reused by others.
 
---
 
## Naming Rules
  
- Lowercase alphanumeric characters, hyphens, and underscores only
- 3–64 characters; no leading or trailing hyphens or underscores
- Case-insensitive uniqueness (`my-collection` and `My-Collection` are the same)
- Reserved names blocked: `api`, `admin`, `system`, `search`, `conformance`,
  `queryables`, and any existing system collection patterns
 
Collection IDs are **immutable after creation**. The current deterministic ID formula
uses `__` (double underscore) as a delimiter — user-specified IDs should avoid this
pattern to remain visually distinguishable from auto-assigned IDs.
 
---
 
## Error Surfacing
 
`DpsStacItemGenerator` currently has no feedback channel back to the user after a DPS
job completes. Collection governance introduces new async failure modes — collection not
found, user not authorized, algorithm version not approved — that users need visibility
into. At minimum, the Lambda will emit structured CloudWatch log events for every
governance decision. A user-facing feedback mechanism (DPS job callback or ingestion
status dashboard) is a dependency that should be resolved before this feature ships.
 
---
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: User-Controlled Collection IDs for DPS STAC #91

Feature: User-Controlled STAC Collection IDs

Background

How the current pipeline works

Proposed Approach

Phase 1: Manual registry of collection ID/prefix + user allowed combinations

Phase 2: Self-Service Collection Creation (future)

Algorithm Authoring Convention

Naming Rules

Error Surfacing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: User-Controlled Collection IDs for DPS STAC #91

Description

Feature: User-Controlled STAC Collection IDs

Background

How the current pipeline works

Proposed Approach

Phase 1: Manual registry of collection ID/prefix + user allowed combinations

Phase 2: Self-Service Collection Creation (future)

Algorithm Authoring Convention

Naming Rules

Error Surfacing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions