Allow users with delete permissions to drop replica tables#453
Conversation
Replica tables are more for DR. Dropping them should not be easy for users. Can we consider having them soft deleted instead ? This would unblock users and also ensure DR is available if required. |
|
Soft deletion will not work because soft deletion keeps them on the same file path without the catalog reference, we don't move files to a .trash path. We could consider that as part of the soft delete implementation but it'll add to storage overhead. I think we need to decide on a stance with replica tables. Do we want to make them self serve or fully managed, because if we want to make them fully managed we need to invest in more automation around the drop patterns. I'm more in favor of self serve as I've yet to see a use case where users need to backfill from destination cluster. |
|
This PR significantly reduces operational burden and addresses what is likely a top-5 production issue. I believe it’s important to merge. The main concern is the safety implication. Like many changes, this is a tradeoff between safety and operability. Here’s why I still think merging is the right call, along with some mitigations and context:
For reference, LinkedIn’s implementation does the following, which is typically sufficient in practice:
|
teamurko
left a comment
There was a problem hiding this comment.
- how many times users get blocked on this + oh on call load vs 2) drop by mistake + recovery overhead, we kind of have a sense of 1 being a problem, but no data on 2. i'd suggest trying 2 and if it causes more problems than it solves, can we revert
Most of the requests are because primary got recreated. So having soft delete + 1 day retention for these should work. If same uuid table needs to be recreated, that indicates an underlying issues that needs to be fixed, like multi schema updates. |
2 could be a problem as well and that is the motivation for soft delete. If an incorrect flow is deployed on multiple instances that leads to DR table also getting deleted, then enabling drop defeats the purpose of having DR in place. soft delete should also help bring down oh oncall load and we can have it just for a day |
|
Merging this PR based on internal alignment as this provides a self-serve way for users to fix broken replication. This will be revisited once we've source controlled replication. |
The existing test only covered delete scenarios after linkedin#453. This update adds update regression coverage and combines all assertions into a single test that verifies: - Update on replica is denied without SYSTEM_ADMIN - Delete on replica is denied without DELETE_TABLE - Delete on replica succeeds with DELETE_TABLE (even without SYSTEM_ADMIN) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary - Improve e2e test coverage for replica table drop permission change introduced in #453 - The existing test lost update regression coverage — this restores it and combines all permission scenarios into a single test ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [ ] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [X] Tests For all the boxes checked, please include additional details of the changes made in this pull request. ## Testing Done <!--- Check any relevant boxes with "x" --> - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [ ] Added new tests for the changes made. - [X] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request. # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [ ] Large PR broken into smaller PRs, and PR plan linked in the description. For all the boxes checked, include additional details of the changes made in this pull request. Co-authored-by: Dushyant Kumar <dukumar@linkedin.biz> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.
Replica tables have limited permissions compared to source tables. This prevents users from updating their replica tables manually as it would lead to table corruption.
However, being unable to drop replica tables easily causes a lot of operational toil. This is because users who want to perform backwards incompatible schema changes or want to drop their table for another reason must reach out to system administrators for them to delete.
Additionally, since most replica tables are copied on a schedule, they generally can be backfilled if deletions were accidentally done.
This PR relaxes the drop restriction on replica tables to anyone with the drop capability for the db (generally just table owners).
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.