Skip to content

fix(xtest): improve audit log collection latency and reliability#408

Draft
dmihalcik-virtru wants to merge 2 commits intoopentdf:mainfrom
dmihalcik-virtru:fix/xtest-audit-framework
Draft

fix(xtest): improve audit log collection latency and reliability#408
dmihalcik-virtru wants to merge 2 commits intoopentdf:mainfrom
dmihalcik-virtru:fix/xtest-audit-framework

Conversation

@dmihalcik-virtru
Copy link
Member

Summary

  • Replace busy-wait polling with condition-based wakeup using threading.Condition
  • Use frozenset constants for audit types instead of literal types
  • Add wait_for_new_data() method and batch-read in _tail_file
  • Improve clock skew handling with explicit timezone support
  • Add attribute_default_rsa fixture for integration tests
  • Update test assertions and add TestAuditConstants class

Parent PRs

Test plan

  • cd xtest && uv run ruff check . && uv run pyright
  • uv run pytest test_audit_logs.py -v

🤖 Generated with Claude Code

Part of stacked PR series decomposing chore/the-claudiest-day-tmux

dmihalcik-virtru and others added 2 commits February 12, 2026 20:12
Bump pyright from 1.1.380 to 1.1.408 and alphabetize dev
dependency ordering in pyproject.toml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: David Mihalcik <dmihalcik@virtru.com>
Replace busy-wait polling with condition-based wakeup using
threading.Condition. Improve clock skew handling with explicit
timezone support. Add frozenset constants for audit types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: David Mihalcik <dmihalcik@virtru.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dmihalcik-virtru, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the reliability, performance, and testability of the audit log collection system. By transitioning from inefficient busy-wait polling to condition-based wakeups, the system can now react more promptly to new log data, reducing latency. The refactoring of audit event constants to frozensets enhances clarity and potentially runtime efficiency. Furthermore, explicit timezone handling addresses potential clock skew issues, and a new test fixture simplifies the setup for specific integration test scenarios.

Highlights

  • Improved Audit Log Collection Efficiency: Replaced busy-wait polling with threading.Condition for more efficient and responsive waiting for new log data, reducing latency in audit log processing.
  • Refactored Audit Event Constants: Migrated audit event type definitions (Object, Action, Result) from Literal types to frozenset constants and introduced explicit string constants for audit verbs, enhancing clarity and runtime performance.
  • Enhanced Clock Skew Handling: Improved the handling of clock skew by explicitly checking for timezone information in datetime objects before converting to UTC, leading to more robust time comparisons.
  • New Test Fixture for RSA Wrapping: Added an attribute_default_rsa pytest fixture to simplify integration tests that specifically require explicit RSA key wrapping, improving test setup and reliability.
  • Updated Test Assertions and Coverage: Updated various test assertions, including removing pytest.approx for exact comparisons, and introduced a TestAuditConstants class to validate the newly defined audit type constants.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • xtest/audit_logs.py
    • Improved timezone handling for collection_time by explicitly checking tzinfo before converting to UTC.
    • Replaced Literal type definitions for ObjectType, ActionType, and ActionResult with frozenset constants (OBJECT_TYPES, ACTION_TYPES, ACTION_RESULTS).
    • Introduced string constants (VERB_DECISION, VERB_POLICY_CRUD, VERB_REWRAP) for audit log message verbs, replacing hardcoded strings.
    • Added _new_data as a threading.Condition to AuditLogCollector for efficient inter-thread communication.
    • Implemented wait_for_new_data() method in AuditLogCollector to allow threads to block until new log data is available.
    • Modified _tail_file to batch-read all available lines before notifying waiting threads, improving efficiency.
    • Updated start() and stop() methods to use the _new_data condition for thread management.
    • Added an is_enabled property to AuditLogAsserter to check if log collection is active.
    • Updated assert_contains, get_parsed_audit_logs, assert_rewrap, assert_policy_crud, and assert_decision_v2 methods to use wait_for_new_data() instead of time.sleep().
    • Replaced hardcoded audit verb strings with the new VERB_ constants in matches_rewrap, matches_policy_crud, matches_decision, and parse_audit_log.
  • xtest/fixtures/attributes.py
    • Added attribute_default_rsa pytest fixture to create an attribute with RSA key mapping on the default KAS, specifically for tests requiring explicit RSA wrapping.
  • xtest/pyproject.toml
    • Updated the pyright development dependency to version 1.1.408.
  • xtest/test_audit_logs.py
    • Imported the new frozenset and verb constants from audit_logs.
    • Added TestAuditConstants class with tests to verify the content and existence of the new audit log constants.
    • Removed pytest.approx from assertions related to ClockSkewEstimate properties, making comparisons exact.
  • xtest/test_audit_logs_integration.py
    • Added an autouse pytest fixture skip_if_audit_disabled to automatically skip integration tests if audit log collection is not enabled.
    • Modified test_rewrap_success_fields, test_audit_logs_on_tampered_file, and test_audit_under_sequential_load to accept and utilize the new attribute_default_rsa fixture, ensuring RSA wrapping.
    • Renamed test_rewrap_success_with_attributes to test_rewrap_failure_access_denied and adjusted its logic to test rewrap failure due to policy denial.
    • Renamed test_namespace_create_audit to test_namespace_crud_audit and test_attribute_create_audit to test_attribute_crud_audit, reflecting broader CRUD testing.
    • Adjusted assertions in test_attribute_crud_audit to correctly verify attribute definition creation, noting that values are embedded.
    • Renamed test_subject_condition_set_create_audit to test_subject_mapping_audit.
    • Added a try-except block around assert_contains for decision logs, acknowledging that their presence can vary by platform configuration.
    • Imported base64 within tamper_policy_binding function.
  • xtest/uv.lock
    • Updated the locked version of the pyright dependency to 1.1.408.
Activity
  • The pull request was generated using Claude Code, indicating an AI-assisted development process.
  • It is part of a larger 'stacked PR series' aimed at decomposing a chore/the-claudiest-day-tmux task.
  • No human review comments or specific progress updates are available in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the audit log collection mechanism by replacing busy-wait polling with threading.Condition, which enhances both latency and reliability. The use of frozenset for constants and batch reading of logs are also good changes.

I've found a few areas for improvement:

  • There's a critical regression in the log collector's start method that could prevent some log files from being tailed.
  • Some redundant code was introduced for timezone conversions.
  • Several tests have misleading names and docstrings that don't match their implementation.
  • There's a minor style issue with local imports.

Overall, the core changes are solid, but the identified issues, especially the bug in the start method, should be addressed.

Comment on lines +663 to +678
existing_files = {
service: path for service, path in self.log_files.items() if path.exists()
}

if not existing_files:
logger.warning(
f"None of the log files exist yet: {list(self.log_files.values())}. "
f"Will wait for them to be created..."
)
existing_files = self.log_files

logger.debug(
f"Starting file-based log collection for: {list(self.log_files.keys())}"
f"Starting file-based log collection for: {list(existing_files.keys())}"
)

for service, log_path in self.log_files.items():
for service, log_path in existing_files.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This new logic introduces a critical regression. If some, but not all, log files exist when start() is called, threads will only be started for the files that already exist because the loop at line 678 iterates over existing_files. The previous implementation correctly started threads for all configured log files, and the _tail_file method would wait for the file to be created. This could lead to missed logs from services whose log files are created after startup.

assert event.key_id is not None or event.algorithm is not None

def test_rewrap_success_with_attributes(
def test_rewrap_failure_access_denied(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test name test_rewrap_failure_access_denied and its docstring suggest it's testing a failure case. However, the implementation performs a successful decryption and asserts for a rewrap_success audit event. This is misleading and should be corrected. Either the test logic should be changed to test a failure, or the name and docstring should be reverted to reflect a success test.

Comment on lines +198 to +202
if collection_time.tzinfo is None:
# Assume local time, convert to UTC
collection_utc = collection_time.astimezone(UTC)
else:
collection_utc = collection_time.astimezone(UTC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else block is redundant as collection_time.astimezone(UTC) is called in both branches. The previous implementation was more concise and correct, as astimezone() handles both naive and timezone-aware datetime objects.

        collection_utc = collection_time.astimezone(UTC)

Comment on lines +365 to +368
if collection_t.tzinfo is None:
collection_utc = collection_t.astimezone(UTC)
else:
collection_utc = collection_t.astimezone(UTC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else block is redundant. astimezone() handles both naive (assuming local timezone) and aware datetimes, so a single call is sufficient and cleaner.

        collection_utc = collection_t.astimezone(UTC)


logger.info(
f"Audit log collection started for: {', '.join(self.log_files.keys())}"
f"Audit log collection started for: {', '.join(existing_files.keys())}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Related to the issue above, this log message can be misleading. It will only report the services for which log files existed at startup, not all services for which log collection is being attempted. It should report all services that are being monitored.

            f"Audit log collection started for: {', '.join(self.log_files.keys())}"

return OpentdfCommandLineTool()

def test_namespace_create_audit(
def test_namespace_crud_audit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test is named test_namespace_crud_audit and the docstring says it tests 'create/update/delete', but the implementation only tests the 'create' operation. To avoid confusion, please either expand the test to cover update and delete, or rename it to test_namespace_create_audit.

assert events[0].action_type == "create"

def test_attribute_create_audit(
def test_attribute_crud_audit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the test above, test_attribute_crud_audit only tests creation. Please consider expanding it to cover other CRUD operations or renaming it to be more specific.

assert len(value_events) >= 2

def test_subject_condition_set_create_audit(
def test_subject_mapping_audit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is named test_subject_mapping_audit and the docstring mentions 'SCS and subject mapping audit trail', but it only creates and asserts a condition_set. It doesn't test subject mapping creation. Please align the test implementation with its name and description.

Comment on lines +385 to +391
import base64

h = pb.hash
altered = base64.b64encode(b"tampered" + base64.b64decode(h)[:8])
pb.hash = str(altered)
else:
import base64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The base64 module is imported locally in two places within this function. It's better practice to have a single top-level import at the beginning of the file for clarity and to avoid repeated imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant