-
Notifications
You must be signed in to change notification settings - Fork 35
CMEK: Add encryption support for event and schema store data at rest #3955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Add encryption manager for transparent data encryption/decryption - Implement AES-256-CTR and AES-256-GCM cipher algorithms - Add encryption format with header for version and key ID - Integrate encryption with event store and schema store - Add mock KMS and TiKV clients for testing - Support graceful degradation on encryption errors - Add configuration options for encryption settings Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @tenfyzhong, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant security enhancement by integrating Customer-Managed Encryption Keys (CMEK) encryption at rest for TiCDC's EventStore and SchemaStore components. The primary objective is to safeguard persisted data on disk by encrypting it with keys managed by customers through their chosen Key Management Service (KMS) providers. This foundational work establishes a robust and flexible encryption architecture, ensuring data confidentiality while maintaining seamless operation and compatibility with existing data structures. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive framework for Customer-Managed Encryption Keys (CMEK) to encrypt data at rest for TiCDC's EventStore and SchemaStore. The changes are extensive, including a new encryption package with support for multiple ciphers, KMS integration, and a layered key model. The implementation also considers backward compatibility and graceful degradation. My review focuses on improving code clarity, robustness, and performance by addressing issues such as non-idiomatic Go patterns for optional dependencies, inconsistent error handling, potential race conditions in caching, and inefficient data access patterns. I've also pointed out some minor issues in comments and mock data to enhance maintainability.
logservice/eventstore/event_store.go
Outdated
| if encryption.IsEncrypted(value) { | ||
| // Try to get encryptionManager from appcontext | ||
| var encMgr encryption.EncryptionManager | ||
| defer func() { | ||
| if r := recover(); r != nil { | ||
| // EncryptionManager not registered, skip decryption | ||
| encMgr = nil | ||
| } | ||
| }() | ||
| encMgr = appcontext.GetService[encryption.EncryptionManager]("EncryptionManager") | ||
| if encMgr != nil { | ||
| // TODO: Get keyspaceID from dispatcher/subscription metadata | ||
| // For now, use default keyspaceID (0) | ||
| keyspaceID := uint32(0) | ||
| decryptedValue, err := encMgr.DecryptData(context.Background(), keyspaceID, value) | ||
| if err != nil { | ||
| log.Panic("failed to decrypt value", zap.Error(err)) | ||
| } | ||
| value = decryptedValue | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment here suggests, fetching the encryptionManager from the global appcontext on every Next() call is inefficient and not idiomatic. It also uses a defer/recover pattern which is best avoided for control flow.
The encryptionManager should be passed to eventStoreIter upon its creation and stored as a field.
- Add
encryptionManager encryption.EncryptionManagerto theeventStoreIterstruct. - In
GetIterator, initialize this new field frome.encryptionManager. - Then, simplify this
Nextmethod to useiter.encryptionManagerdirectly.
This would improve performance and make the code cleaner and more maintainable.
if iter.encryptionManager != nil && encryption.IsEncrypted(value) {
// TODO: Get keyspaceID from dispatcher/subscription metadata
// For now, use default keyspaceID (0)
keyspaceID := uint32(0)
decryptedValue, err := iter.encryptionManager.DecryptData(context.Background(), keyspaceID, value)
if err != nil {
log.Panic("failed to decrypt value", zap.Error(err))
}
value = decryptedValue
}| if err != nil { | ||
| log.Fatal("decrypt db info failed", zap.Error(err)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using log.Fatal can lead to abrupt program termination without proper cleanup or stack unwinding. It's generally better to return an error from this function and let the caller decide on the appropriate action. If the error is considered truly unrecoverable at this level, panic(err) would be a better choice as it allows for a top-level recovery mechanism to perform a more graceful shutdown. This applies to other log.Fatal calls in this file as well.
| if err != nil { | |
| log.Fatal("decrypt db info failed", zap.Error(err)) | |
| } | |
| if err != nil { | |
| return nil, errors.Trace(err) | |
| } |
| // DecryptMasterKey decrypts the master key using mock KMS | ||
| // In a real implementation, this would call the actual KMS service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataKeyID is defined to be 3 bytes long throughout the encryption framework (e.g., in format.go and types.go). However, these mock data key IDs are 8 bytes long. This will cause EncodeEncryptedData to fail. The mock data should be updated to use 3-byte strings for data key IDs to be consistent with the implementation.
mockDataKeyID1 := "001"
mockDataKeyID2 := "002"| if cached, ok := m.metaCache[keyspaceID]; ok { | ||
| // Check if cache is still valid | ||
| if time.Since(cached.timestamp) < m.ttl { | ||
| meta := cached.meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logservice/eventstore/event_store.go
Outdated
| defer func() { | ||
| if r := recover(); r != nil { | ||
| // EncryptionManager not registered, use nil | ||
| encMgr = nil | ||
| } | ||
| }() | ||
| encMgr = appcontext.GetService[encryption.EncryptionManager]("EncryptionManager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using defer and recover for control flow to handle an optional dependency is not idiomatic Go. It can obscure the program's control flow and is generally reserved for handling unexpected panics. A better approach would be to have a TryGetService function in appcontext that returns a boolean indicating whether the service was found, for example: encMgr, ok := appcontext.TryGetService[...](...). This would make the code clearer and more robust.
logservice/eventstore/event_store.go
Outdated
| // TODO: Get keyspaceID from dispatcher/subscription metadata | ||
| // For now, use default keyspaceID (0) for classic mode | ||
| keyspaceID := uint32(0) | ||
| encryptedValue, err := e.encryptionManager.EncryptData(context.Background(), keyspaceID, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using context.Background() here means this encryption operation will not be cancelled if the parent context (from the write task pool) is cancelled. It's better to pass the context from writeTaskPool.run down to writeEvents and use it here. This ensures that long-running operations can be properly cancelled.
pkg/encryption/cipher.go
Outdated
| return &AES256GCMCipher{} | ||
| } | ||
|
|
||
| // IVSize returns the IV size for AES-256-GCM (12 bytes recommended, but we use 16 for compatibility) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states that 16 bytes are used for compatibility, but the function returns 12. This is confusing. The standard and recommended nonce size for GCM is 12 bytes, which the code correctly returns. The comment should be updated to reflect this and remove the mention of 16 bytes to avoid confusion.
| // IVSize returns the IV size for AES-256-GCM (12 bytes recommended, but we use 16 for compatibility) | |
| // IVSize returns the IV size for AES-256-GCM (12 bytes). |
pkg/encryption/types.go
Outdated
| // DataKeyID represents a 3-byte data key identifier | ||
| type DataKeyID [3]byte | ||
|
|
||
| // ToString converts DataKeyID to hex string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says ToString converts the DataKeyID to a hex string, but the implementation performs a direct byte-to-string conversion. The comment should be corrected to reflect the actual behavior.
| // ToString converts DataKeyID to hex string | |
| // ToString converts DataKeyID to a string. |
- Add unit tests for cipher functionality including unsupported algorithm detection and AES256CTR encryption/decryption - Add unit tests for data format encoding/decoding with encrypted and unencrypted data - Add unit tests for mock TiKV client behavior including keyspace encryption meta retrieval Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
- Replace panic-prone appcontext.GetService calls with TryGetService for optional encryption manager - Pass encryption manager to eventStoreIter to avoid repeated appcontext lookups - Update encryption format to support version-based detection from TiKV metadata - Add GetEncryptionVersion method to encryption meta manager interface - Improve backward compatibility for legacy unencrypted data formats - Add comprehensive tests for encryption format handling Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
- Add keyspaceID field to eventWithCallback struct to carry keyspace information - Store keyspaceID in dispatcher statistics during registration - Pass keyspaceID to encryption manager for both encryption and decryption operations - Update encryption manager to return errors instead of graceful degradation when configured - Add test to verify keyspaceID is correctly used in encryption/decryption flow Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
- Add test for encryption degrade on error with allow flag enabled - Add test for encryption degrade on error with allow flag disabled - Add test for encryption disabled scenario - Include mock meta manager for testing encryption manager behavior Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
- Remove redundant comments and build tags from test files - Use sync.Once for safe stop channel closure in encryption manager - Clear meta cache entries before refresh to ensure fresh data - Standardize formatting and remove outdated compatibility comments Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
What problem does this PR solve?
Issue Number: close #3943
What is changed and how it works?
This PR introduces CMEK (Customer-Managed Encryption Keys) encryption at rest for TiCDC's EventStore and SchemaStore components. The primary goal is to enhance data security by encrypting persisted data on disk using encryption keys managed by customers through their preferred KMS (Key Management Service) providers.
Key changes include:
The encryption works by:
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note