Summary
Add a --redact (or similar) flag to infrahub-backup that anonymises all sensitive data values before export. This would allow customers to share database backups for debugging without exposing proprietary data.
Proposed approach
- Customer uses
infrahub-backup to back up their production environment
- Restores the backup to a staging or local environment (customers already do this)
- Runs
infrahub-backup again with the --redact flag, which executes a Cypher query to replace all value properties on AttributeValue nodes with UUIDs
- The resulting dump contains the full graph structure (nodes, relationships, schema, hierarchy) but no real data values
Why this matters
Having access to a customer's full graph structure (with anonymised values) would dramatically improve our ability to debug performance issues, merge corruption, and other problems that are difficult to reproduce without real-world scale and topology.
Open questions
- Exact naming of the flag (
--redact, --anonymize, etc.)
- Whether the Cypher query should be bundled into
infrahub-backup directly or provided as a separate utility
- Whether any non-AttributeValue data also needs redaction (e.g. node names, labels)
Summary
Add a
--redact(or similar) flag toinfrahub-backupthat anonymises all sensitive data values before export. This would allow customers to share database backups for debugging without exposing proprietary data.Proposed approach
infrahub-backupto back up their production environmentinfrahub-backupagain with the--redactflag, which executes a Cypher query to replace allvalueproperties onAttributeValuenodes with UUIDsWhy this matters
Having access to a customer's full graph structure (with anonymised values) would dramatically improve our ability to debug performance issues, merge corruption, and other problems that are difficult to reproduce without real-world scale and topology.
Open questions
--redact,--anonymize, etc.)infrahub-backupdirectly or provided as a separate utility