Reliability improvements for the Keylime agent#1164
Merged
ansasaki merged 4 commits intokeylime:masterfrom Jan 5, 2026
Merged
Conversation
Contributor
sergio-correia
commented
Dec 17, 2025
- Add operational context to TPM mutex errors for better debugging
- Remove unused session request code (89 lines of dead code)
- Fix panic on missing ek_handle configuration
08bd045 to
13fd5e8
Compare
Replace panic with proper error handling for ek_handle configuration.
Instead of using .expect() which would panic on configuration errors,
use .map_err() to provide a helpful error message that guides users
to verify their ek_handle configuration.
Changes:
- Replace .expect("failed to get ek_handle") with .map_err()
- Add descriptive error message for configuration issues
- Remove completed TODO comment
- Add test to verify error handling behavior
This improves reliability by ensuring configuration errors result in
graceful failures with actionable error messages instead of panics.
Assisted-by: Claude Sonnet 4.5
Signed-off-by: Sergio Correia <scorreia@redhat.com>
PoP (Proof of Possession) authentication is now handled by keylime::auth::AuthenticationClient via middleware in ResilientClient. The get_session_request() trait method and get_session_request_final() implementation were never used in production code - only in tests. This commit removes this dead code to eliminate confusion and reduce technical debt. Assisted-by: Claude Sonnet 4.5 Signed-off-by: Sergio Correia <scorreia@redhat.com>
Replace all 10 production code mutex unwraps with proper error handling to eliminate //#[allow_ci] bypass markers in the shared TPM library. Changes: - Replaced MutexPoisoned with MutexPoisonedDuringOperation error variant - Each error now includes the operation name (e.g., "create_ek", "quote") - Error messages provide clear context about which TPM operation failed - All 10 mutex lock sites updated with operation-specific error handling Benefits: - Eliminates unexpected panics in favor of graceful error propagation - Improved debugging: error messages identify the exact operation that failed - Better observability: clear error messages explain the issue and required action - Zero runtime overhead: uses &'static str for operation names Example error message: "TPM context mutex was poisoned during 'quote' operation. This indicates a critical bug where a thread panicked while holding the TPM lock. The agent must be restarted." Assisted-by: Claude Sonnet 4.5 Signed-off-by: Sergio Correia <scorreia@redhat.com>
Signed-off-by: Sergio Correia <scorreia@redhat.com>
ansasaki
approved these changes
Jan 5, 2026
34 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.