-
Notifications
You must be signed in to change notification settings - Fork 7
Describe mechanism for automated retention management #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5bcdf4a to
06aeb82
Compare
samdbmg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - one thought inlined about using tags, but it's just more reasoning, I agree with the conclusions.
|
Could this application note also provide clarification on media objects that were allocated but not used in any flow segments? The spec so far just says that implementations have to handle this, but there is no guidance on how long a client can expect the objects to be valid for. Would it be possible e.g. for the implementation to define a latest time point at which flow segments must have been registered for allocated objects? |
I shall have a think. But we've avoided being specific on this in the past as its hard for us to make universally useful recommendations. Different workflows and deployments will have different requirements in this space. A large organisation-wide installation may experience significant cost impacts of retaining unused content longer than necessary. But a transfer over a poor quality connection may require a larger grace period. I think the desire to have well defined expectations on the part of writing clients is reasonable. But my fear is that whatever number/mechanism we choose would be wrong in a large number of cases, or at least would have quite real and significant consequences to organisations using TAMS. I feel the best that we can do in this case is call out that this is something implementations should consider, and leave it to them to make an informed decision on the best approach based on the needs of themselves and their customers. |
|
Over and above the retention of Flow segments and the objects that they reference, a thing that worries me slightly is "orphaned" objects - chunks of media which have been stored in the storage back-end (probably S3 objects) but have never been registered in a segment on a flow. For S3, implementations could be created using S3 object create event notifications to register the object into database table, with registering the object against a segment removing the object from that table. Objects in the table that are older than a defined age are deleted. I wonder whether there is guidance anywhere for keeping a clean and orphan-free storage layer? |
I think thats similar to the comment above. I guess the bit worth adding is that those Objects have to be tracked by the TAMS service from the point at which they are created via the API anyway. We require that the first registration of an Object against as Segment MUST be against the Flow the storage was allocated against. This is a requirement particularly aimed at implementations that support fine-grained auth to make sure permissions can be derived at all points in the lifecycle, and to avoid any weird edge cases where storage is assigned against one flow and registered against another or where a malicious actor might "steal" an object and register it against a flow they have permissions for between media being uploaded, and it being associated with the legimate Flow. But it's also to facilitate this sort of handling of objects which are never registered against segments. As I say above, I think theres a bunch of reasons we can only go so far with recommendations in this space. But perhaps we need to make the current language clearer. |
|
As mentioned on Friday, I think we need to have a better understanding around what is going on at the Source level here - if anything needs to be signalled or an idea of what queries can be made to discover how long a particular source will stick around for. I think users will be most interested in understanding retention and managing retention on a source rather than individual flows. As well as understanding any complexities around flows that have segments stored at the multi and mono essence layers, and where flows end up being collected by multiple source multis, ensuring the underlying flows are not deleted before any users expects. |
I think it is reasonable to have different deadlines based on the usage, but I would strongly suggest having a consistent way to signal this expectation to clients. Especially when clients and service implementation come from different vendors. E.g. having an optional field signalling the deadline/timeout for registration in the object allocation response. This would then guarantee to the client that presigned urls and uploaded objects are valid for a certain amount of time. |
Spent a little while talking about this with @j616 this morning, and I think he's going to try and capture some of the options somewhere. For writing, I find myself wondering whether it's enough to stipulate a minimum validity time (say, 5min?) and expecting clients should request URLs, upload objects and register them in a timely manner to meet that deadline. As in, "you may complete upload and registration any time within 5min of making this request: beyond that it may fail". Beyond that time, the upload URL will expire, and then any object uploaded to it but not registered should probably be deleted (e.g. using the same reference counting/garbage collection as deleting the last Flow Segment that references an object). I agree on the value of being flexible in general, but in this case part of me would rather not make this a config option because then writing clients have to include logic to handle different values. Instead, I think clients should be requesting a new page of URLs frequently enough to never run into that deadline (but calling out what the deadline is could be useful). I suspect read is a different case: depending on your reader you might need much more time. That one probably deserves more thought: should that also be fixed? Should that be a request parameter? Or a rubric like "2x the duration of the page"? |
c63e409 to
7fe2a37
Compare
|
I've made a first pass at ADR options for signalling garbage collection and presigned URL timeouts in 7fe2a37 . I've haven't selected any options yet. I personally favour 8a and 9a for the reasons outlined in the document. But I'd appreciate feedback. |
Details
This PR includes an ADR that presents multiple options for the signalling and implementation of retention management, alongside the relevant tags and an Application Note for the chosen options. This PR also adds the tag currently used by the AWS store implementation to the listing.
Jira Issue (if relevant)
Jira URL: https://jira.dev.bbc.co.uk/browse/CLOUDFIT-5483
Related PRs
Where appropriate. Indicate order to be merged.
Submitter PR Checks
(tick as appropriate)
Reviewer PR Checks
(tick as appropriate)
Info on PRs
The checks above are guidelines. They don't all have to be ticked, but they should all have been considered.