Skip to content

Conversation

@j616
Copy link
Contributor

@j616 j616 commented Nov 14, 2025

Details

This PR includes an ADR that presents multiple options for the signalling and implementation of retention management, alongside the relevant tags and an Application Note for the chosen options. This PR also adds the tag currently used by the AWS store implementation to the listing.

Jira Issue (if relevant)

Jira URL: https://jira.dev.bbc.co.uk/browse/CLOUDFIT-5483

Related PRs

Where appropriate. Indicate order to be merged.

Submitter PR Checks

(tick as appropriate)

  • PR completes task/fixes bug
  • API version has been incremented if necessary
  • ADR status has been updated, and ADR implementation has been recorded
  • Documentation updated (README, etc.)
  • PR added to Jira Issue (if relevant)
  • Follow-up stories added to Jira

Reviewer PR Checks

(tick as appropriate)

  • PR completes task/fixes bug
  • Design makes sense, and fits with our current code base
  • Code is easy to follow
  • PR size is sensible
  • Commit history is sensible and tidy

Info on PRs

The checks above are guidelines. They don't all have to be ticked, but they should all have been considered.

@j616 j616 requested a review from a team as a code owner November 14, 2025 16:44
@j616 j616 force-pushed the jamessa-loopRecord branch from 5bcdf4a to 06aeb82 Compare November 14, 2025 16:46
Copy link
Member

@samdbmg samdbmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one thought inlined about using tags, but it's just more reasoning, I agree with the conclusions.

@iSchluff
Copy link
Contributor

Could this application note also provide clarification on media objects that were allocated but not used in any flow segments? The spec so far just says that implementations have to handle this, but there is no guidance on how long a client can expect the objects to be valid for.

Would it be possible e.g. for the implementation to define a latest time point at which flow segments must have been registered for allocated objects?

@j616
Copy link
Contributor Author

j616 commented Nov 25, 2025

Could this application note also provide clarification on media objects that were allocated but not used in any flow segments? The spec so far just says that implementations have to handle this, but there is no guidance on how long a client can expect the objects to be valid for.

Would it be possible e.g. for the implementation to define a latest time point at which flow segments must have been registered for allocated objects?

I shall have a think. But we've avoided being specific on this in the past as its hard for us to make universally useful recommendations. Different workflows and deployments will have different requirements in this space. A large organisation-wide installation may experience significant cost impacts of retaining unused content longer than necessary. But a transfer over a poor quality connection may require a larger grace period. I think the desire to have well defined expectations on the part of writing clients is reasonable. But my fear is that whatever number/mechanism we choose would be wrong in a large number of cases, or at least would have quite real and significant consequences to organisations using TAMS. I feel the best that we can do in this case is call out that this is something implementations should consider, and leave it to them to make an informed decision on the best approach based on the needs of themselves and their customers.

@himslm01
Copy link
Contributor

Over and above the retention of Flow segments and the objects that they reference, a thing that worries me slightly is "orphaned" objects - chunks of media which have been stored in the storage back-end (probably S3 objects) but have never been registered in a segment on a flow.

For S3, implementations could be created using S3 object create event notifications to register the object into database table, with registering the object against a segment removing the object from that table. Objects in the table that are older than a defined age are deleted.

I wonder whether there is guidance anywhere for keeping a clean and orphan-free storage layer?

@j616
Copy link
Contributor Author

j616 commented Nov 27, 2025

Over and above the retention of Flow segments and the objects that they reference, a thing that worries me slightly is "orphaned" objects - chunks of media which have been stored in the storage back-end (probably S3 objects) but have never been registered in a segment on a flow.

For S3, implementations could be created using S3 object create event notifications to register the object into database table, with registering the object against a segment removing the object from that table. Objects in the table that are older than a defined age are deleted.

I wonder whether there is guidance anywhere for keeping a clean and orphan-free storage layer?

I think thats similar to the comment above. I guess the bit worth adding is that those Objects have to be tracked by the TAMS service from the point at which they are created via the API anyway. We require that the first registration of an Object against as Segment MUST be against the Flow the storage was allocated against. This is a requirement particularly aimed at implementations that support fine-grained auth to make sure permissions can be derived at all points in the lifecycle, and to avoid any weird edge cases where storage is assigned against one flow and registered against another or where a malicious actor might "steal" an object and register it against a flow they have permissions for between media being uploaded, and it being associated with the legimate Flow. But it's also to facilitate this sort of handling of objects which are never registered against segments. As I say above, I think theres a bunch of reasons we can only go so far with recommendations in this space. But perhaps we need to make the current language clearer.

@GeorginaShippey
Copy link
Contributor

As mentioned on Friday, I think we need to have a better understanding around what is going on at the Source level here - if anything needs to be signalled or an idea of what queries can be made to discover how long a particular source will stick around for. I think users will be most interested in understanding retention and managing retention on a source rather than individual flows. As well as understanding any complexities around flows that have segments stored at the multi and mono essence layers, and where flows end up being collected by multiple source multis, ensuring the underlying flows are not deleted before any users expects.

@iSchluff
Copy link
Contributor

iSchluff commented Dec 1, 2025

I shall have a think. But we've avoided being specific on this in the past as its hard for us to make universally useful recommendations. Different workflows and deployments will have different requirements in this space.

I think it is reasonable to have different deadlines based on the usage, but I would strongly suggest having a consistent way to signal this expectation to clients. Especially when clients and service implementation come from different vendors. E.g. having an optional field signalling the deadline/timeout for registration in the object allocation response. This would then guarantee to the client that presigned urls and uploaded objects are valid for a certain amount of time.
Otherwise a client might unexpectedly get an auth error on object upload or an error on flow registration about nonexistent objects.

@samdbmg
Copy link
Member

samdbmg commented Dec 10, 2025

Could this application note also provide clarification on media objects that were allocated but not used in any flow segments? The spec so far just says that implementations have to handle this, but there is no guidance on how long a client can expect the objects to be valid for.

Would it be possible e.g. for the implementation to define a latest time point at which flow segments must have been registered for allocated objects?

Spent a little while talking about this with @j616 this morning, and I think he's going to try and capture some of the options somewhere.

For writing, I find myself wondering whether it's enough to stipulate a minimum validity time (say, 5min?) and expecting clients should request URLs, upload objects and register them in a timely manner to meet that deadline. As in, "you may complete upload and registration any time within 5min of making this request: beyond that it may fail". Beyond that time, the upload URL will expire, and then any object uploaded to it but not registered should probably be deleted (e.g. using the same reference counting/garbage collection as deleting the last Flow Segment that references an object).

I agree on the value of being flexible in general, but in this case part of me would rather not make this a config option because then writing clients have to include logic to handle different values. Instead, I think clients should be requesting a new page of URLs frequently enough to never run into that deadline (but calling out what the deadline is could be useful).

I suspect read is a different case: depending on your reader you might need much more time. That one probably deserves more thought: should that also be fixed? Should that be a request parameter? Or a rubric like "2x the duration of the page"?

@j616 j616 force-pushed the jamessa-loopRecord branch from c63e409 to 7fe2a37 Compare January 5, 2026 14:04
@j616
Copy link
Contributor Author

j616 commented Jan 5, 2026

I've made a first pass at ADR options for signalling garbage collection and presigned URL timeouts in 7fe2a37 . I've haven't selected any options yet. I personally favour 8a and 9a for the reasons outlined in the document. But I'd appreciate feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants