Skip to content

Fix: Truncate production log tracebacks to 10 lines#686

Open
hamzawritescode wants to merge 6 commits intopotpie-ai:mainfrom
hamzawritescode:fix-logger-truncation
Open

Fix: Truncate production log tracebacks to 10 lines#686
hamzawritescode wants to merge 6 commits intopotpie-ai:mainfrom
hamzawritescode:fix-logger-truncation

Conversation

@hamzawritescode
Copy link
Copy Markdown

@hamzawritescode hamzawritescode commented Mar 9, 2026

This pull request significantly refactors the app/modules/utils/logger.py module, focusing on simplifying the codebase, removing custom logging configuration utilities, and improving the handling of tracebacks in production log sinks. The changes streamline the logger implementation by removing advanced features such as sensitive data filtering, context management, and dynamic log level adjustment, leaving only the essentials for production log formatting.

Logger codebase simplification and removal of utilities:

  • Removed all custom logging setup, including the configure_logging, InterceptHandler, log_context, setup_logger, should_show_stack_trace, and set_library_log_level functions, as well as related environment variable handling and sensitive data filtering logic. This eliminates complex configuration and context management from the logger module.
  • Deleted all sensitive data redaction patterns and the filter_sensitive_data function, so log messages and exceptions are no longer filtered for secrets or credentials.

Production log sink improvements:

  • Added a new helper function _truncate_traceback to truncate exception tracebacks to the last 10 lines in the production log sink, making logs more concise and focused on the root cause.
  • Updated the production_log_sink function to use the new traceback truncation helper and removed sensitive data filtering from exception tracebacks and messages.
  • Simplified the extra fields handling in production_log_sink by removing filtering and flattening logic, now directly adding extra fields to the log output.

Summary by CodeRabbit

  • Refactor
    • Simplified logging pipeline, removing heavy runtime interception and multi-path configuration
    • Tracebacks truncated for brevity (default to a limited number of lines)
    • Redaction rules reduced to two core patterns to streamline sanitization
    • Production logs emitted as single-line flat JSON with integrated exception details
    • Extras sanitized (string values redacted; non-string preserved) and lightweight compatibility shims retained

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR flattens logging infrastructure: replaces interception-based handlers with lightweight compatibility shims, consolidates redaction to two private patterns, adds traceback truncation, sanitizes extras, and emits single-line redacted JSON logs via a simplified production sink.

Changes

Cohort / File(s) Summary
Logger module
app/modules/utils/logger.py
Removed complex interception and environment-driven logging setup. Replaced many handlers/utilities with lightweight shims: SinkHandler, setup_logger, and configure_logging (compatibility). Introduced private _truncate_traceback(traceback_str: str, num_lines: int = 10). Simplified production_log_sink() to produce single-line JSON logs, truncate tracebacks, apply narrowed redaction rules, sanitize extras (redact strings, preserve non-strings), and flush stdout. Removed InterceptHandler, detailed log_context implementation, should_show_stack_trace, and set_library_log_level.

Sequence Diagram(s)

(Skipped — changes are refactor/compatibility-focused and do not introduce a new multi-component sequential flow requiring visualization.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

🐰
I nibble traces, trim them neat,
One-line logs — concise and sweet,
Secrets hid in tiny hops,
I tidy bytes and never stop,
A hop, a flush, and bedtime treats.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main change: truncating production log tracebacks to 10 lines via the _truncate_traceback helper added to production_log_sink.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/modules/utils/logger.py (1)

1-57: ⚠️ Potential issue | 🔴 Critical

Critical: Removed exports break application startup.

The functions configure_logging, setup_logger, and log_context are no longer defined in app/modules/utils/logger.py but are still imported by 100+ modules across the codebase, including critical startup paths:

Function Import Count Critical Locations
configure_logging 3 app/main.py:50, app/celery/celery_app.py:29
setup_logger 100+ Throughout codebase (intelligence, parsing, integrations, etc.)
log_context 5 app/modules/utils/logging_middleware.py:15, app/celery/tasks/parsing_tasks.py:7

This will cause ImportError on application startup and prevent Celery workers from initializing.

Either:

  1. Keep the removed functions (or provide stubs/replacements)
  2. Update all callers to use the new logging approach
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 1 - 57, The module removed three
widely-imported symbols causing ImportError; restore compatibility by
reintroducing configure_logging, setup_logger, and log_context in
app/modules/utils/logger.py as thin wrappers or stubs that preserve the original
public signatures and behavior but delegate to the new implementation (e.g., use
production_log_sink, filter_sensitive_data, and existing log configuration
helpers); ensure configure_logging still sets handlers and sinks expected by
callers, setup_logger returns a logger-like object or configures the named
logger, and log_context provides the previous context-manager or decorator
interface so existing imports across the codebase continue to work without
changing callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/modules/utils/logger.py`:
- Line 1: The module is missing imports for json and sys which are used by
functions like _truncate_traceback (and any code calling json.loads(),
json.dumps(), sys.stdout.write(), sys.stdout.flush()); add "import json" and
"import sys" at the top of the file before any function definitions so those
names are defined at runtime and avoid NameError when _truncate_traceback or
other logger utilities invoke json/sys functions.
- Around line 31-33: The code calls a now-removed filter_sensitive_data function
(used in building log payloads for "value", "traceback" and other fields) which
will raise NameError; either remove these calls and pass the raw values through
(e.g., replace filter_sensitive_data(x) with x) everywhere it is referenced, or
restore/import an appropriate implementation named filter_sensitive_data and
ensure it's applied before _truncate_traceback when used in the "traceback"
construction; update all occurrences (search for filter_sensitive_data in
logger.py, including the entries that build the error payload and any other log
formatting code) so they consistently follow the chosen approach.

---

Outside diff comments:
In `@app/modules/utils/logger.py`:
- Around line 1-57: The module removed three widely-imported symbols causing
ImportError; restore compatibility by reintroducing configure_logging,
setup_logger, and log_context in app/modules/utils/logger.py as thin wrappers or
stubs that preserve the original public signatures and behavior but delegate to
the new implementation (e.g., use production_log_sink, filter_sensitive_data,
and existing log configuration helpers); ensure configure_logging still sets
handlers and sinks expected by callers, setup_logger returns a logger-like
object or configures the named logger, and log_context provides the previous
context-manager or decorator interface so existing imports across the codebase
continue to work without changing callers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e730f98-1d04-4a20-9792-a9a86fa7dabf

📥 Commits

Reviewing files that changed from the base of the PR and between c209a7c and 8eb8b7b.

📒 Files selected for processing (1)
  • app/modules/utils/logger.py

Comment thread app/modules/utils/logger.py Outdated
Comment thread app/modules/utils/logger.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/modules/utils/logger.py (1)

1-77: ⚠️ Potential issue | 🔴 Critical

Critical: Removed exports break application startup.

This refactor removes configure_logging, setup_logger, and log_context that are actively imported and used throughout the codebase:

  • app/main.py:53 and app/celery/celery_app.py:37 call configure_logging() at startup
  • setup_logger(__name__) is called in 100+ modules to create logger instances
  • log_context() is used as a context manager in multiple modules (e.g., app/parsing/graph_construction/parsing_service.py:124, app/celery/tasks/agent_tasks.py:97, app/conversations/conversations_router.py:199)

Restore these functions or update all dependent modules to remove/replace these imports before merging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 1 - 77, The PR removed exported
logging helpers causing startup failures; restore the three public APIs
configure_logging(), setup_logger(name) and log_context(...) with their previous
signatures (at least the ones used across the repo) inside
app/modules/utils/logger.py, rewire configure_logging to configure the logger to
use the existing production_log_sink, implement setup_logger to return a
configured logger object (using production_log_sink and any formatters/levels
previously expected), and reintroduce log_context as a contextmanager that
injects extra context into log records; ensure the restored function names and
signatures match existing imports so dependent modules (configure_logging,
setup_logger, log_context) continue to work without changing callers.
🧹 Nitpick comments (3)
app/modules/utils/logger.py (3)

65-71: Minor: Nested objects in extras won't have sensitive data filtered.

The current logic only filters str and bytes values in extras. If an extra field contains a dict or list with sensitive data, it will pass through unfiltered.

Consider recursive filtering for complex types if sensitive data might appear in nested structures:

♻️ Potential enhancement for nested filtering
def filter_value(value):
    if isinstance(value, str):
        return filter_sensitive_data(value)
    elif isinstance(value, bytes):
        return filter_sensitive_data(value.decode('utf-8', errors='replace'))
    elif isinstance(value, dict):
        return {k: filter_value(v) for k, v in value.items()}
    elif isinstance(value, list):
        return [filter_value(v) for v in value]
    return value
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 65 - 71, The extras handling only
filters str/bytes at the top level (in the loop handling record.get("extra")) so
nested dicts/lists remain unfiltered; add a recursive helper (e.g.,
filter_value) referenced by the extras loop to walk dicts, lists/tuples (and
decode bytes), calling filter_sensitive_data on strings and returning other
types unchanged, then assign log_data[key] = filter_value(value) instead of the
current branching so nested sensitive fields are sanitized; keep the helper next
to the existing logic that uses record, extras, log_data and
filter_sensitive_data.

7-11: Limited sensitive data patterns may leak secrets in logs.

The current patterns only match password='...' and api_key='...' with specific quote formats. This misses many common sensitive data formats:

  • Unquoted values: password=secret123
  • JSON format: "password": "secret123"
  • Other sensitive fields: token, secret, authorization, credentials, bearer
  • Case variations: PASSWORD, API_KEY

Consider expanding the patterns for more comprehensive redaction:

♻️ Proposed expanded patterns
 # Define sensitive patterns for redaction
 SENSITIVE_PATTERNS = [
-    (re.compile(r'password=[\'"]([^\'"]+)[\'"]'), 'password=********'),
-    (re.compile(r'api_key=[\'"]([^\'"]+)[\'"]'), 'api_key=********'),
+    (re.compile(r'(password|passwd|pwd)\s*[=:]\s*[\'"]?[^\s\'"]+[\'"]?', re.IGNORECASE), r'\1=********'),
+    (re.compile(r'(api_key|apikey|api[-_]?secret)\s*[=:]\s*[\'"]?[^\s\'"]+[\'"]?', re.IGNORECASE), r'\1=********'),
+    (re.compile(r'(token|bearer|authorization|secret|credential)\s*[=:]\s*[\'"]?[^\s\'"]+[\'"]?', re.IGNORECASE), r'\1=********'),
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 7 - 11, SENSITIVE_PATTERNS
currently only matches quoted "password" and "api_key" forms and can miss many
secret formats; update the SENSITIVE_PATTERNS constant to include additional
regexes (with re.IGNORECASE) that cover unquoted assignments (e.g.,
password=secret123), JSON-style key/value pairs (e.g., "password": "secret123"
or 'password':'secret123'), HTTP-style headers (e.g., Authorization: Bearer
abc...), and additional keys like token, secret, authorization, credentials,
bearer, api_key and their case variants; ensure each pattern uses word
boundaries and replaces only the value portion with a fixed redaction token
(e.g., ********) so existing redaction code that iterates over
SENSITIVE_PATTERNS continues to work unchanged.

1-5: Minor: Remove unused logging import.

The logging module is imported on line 2 but never used in this file.

♻️ Proposed fix
 import json
-import logging
 import os
 import re
 import sys
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 1 - 5, Remove the unused top-level
import "logging" from the imports block (the import line "import logging") in
this module; verify no other references to the logging symbol exist in functions
or classes in logger.py and run lint/type-check to ensure no residual usage
remains before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@app/modules/utils/logger.py`:
- Around line 1-77: The PR removed exported logging helpers causing startup
failures; restore the three public APIs configure_logging(), setup_logger(name)
and log_context(...) with their previous signatures (at least the ones used
across the repo) inside app/modules/utils/logger.py, rewire configure_logging to
configure the logger to use the existing production_log_sink, implement
setup_logger to return a configured logger object (using production_log_sink and
any formatters/levels previously expected), and reintroduce log_context as a
contextmanager that injects extra context into log records; ensure the restored
function names and signatures match existing imports so dependent modules
(configure_logging, setup_logger, log_context) continue to work without changing
callers.

---

Nitpick comments:
In `@app/modules/utils/logger.py`:
- Around line 65-71: The extras handling only filters str/bytes at the top level
(in the loop handling record.get("extra")) so nested dicts/lists remain
unfiltered; add a recursive helper (e.g., filter_value) referenced by the extras
loop to walk dicts, lists/tuples (and decode bytes), calling
filter_sensitive_data on strings and returning other types unchanged, then
assign log_data[key] = filter_value(value) instead of the current branching so
nested sensitive fields are sanitized; keep the helper next to the existing
logic that uses record, extras, log_data and filter_sensitive_data.
- Around line 7-11: SENSITIVE_PATTERNS currently only matches quoted "password"
and "api_key" forms and can miss many secret formats; update the
SENSITIVE_PATTERNS constant to include additional regexes (with re.IGNORECASE)
that cover unquoted assignments (e.g., password=secret123), JSON-style key/value
pairs (e.g., "password": "secret123" or 'password':'secret123'), HTTP-style
headers (e.g., Authorization: Bearer abc...), and additional keys like token,
secret, authorization, credentials, bearer, api_key and their case variants;
ensure each pattern uses word boundaries and replaces only the value portion
with a fixed redaction token (e.g., ********) so existing redaction code that
iterates over SENSITIVE_PATTERNS continues to work unchanged.
- Around line 1-5: Remove the unused top-level import "logging" from the imports
block (the import line "import logging") in this module; verify no other
references to the logging symbol exist in functions or classes in logger.py and
run lint/type-check to ensure no residual usage remains before committing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 94a08a63-0a48-4ecb-9f02-9367eae19b48

📥 Commits

Reviewing files that changed from the base of the PR and between 8eb8b7b and 575f705.

📒 Files selected for processing (1)
  • app/modules/utils/logger.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/modules/utils/logger.py`:
- Around line 26-36: The refactor removed the public logger API
(configure_logging, setup_logger, log_context) but callers still import/use
these functions; restore those functions (or provide backward-compatible
wrappers) in this module so app/main.py, app/celery/celery_app.py,
app/modules/utils/logging_middleware.py, and app/modules/utils/install_gvisor.py
keep working. Specifically, reintroduce configure_logging(...) and
setup_logger(...) that internally use the new
production_log_sink/_truncate_traceback helpers (or forward to the new
implementation), and provide a log_context contextmanager/function that yields
the same interface expected by callers; alternatively, change all callers to use
the new API consistently but ensure this PR updates app/main, celery_app,
logging_middleware, and install_gvisor together to avoid import/runtime errors.
- Around line 12-15: The logging sink currently only applies SENSITIVE_PATTERNS
to string blobs, allowing structured extras like {"password": "secret"} to pass
through; update filter_sensitive_data to accept non-string inputs and
recursively traverse dicts/lists/tuples to mask secret values (keys matching the
_P_WORD and _A_KEY identifiers and any other sensitive keys) before any
serialization, and ensure callers that previously only filtered str/bytes extras
call the updated filter_sensitive_data on extras (so SENSITIVE_PATTERNS is still
used for string regex redaction but structured values are blanked/masked at the
dict key level first).
- Around line 69-75: The extras flattening loop (using variables extras, record,
log_data and filter_sensitive_data) must not allow bound extras to overwrite
canonical metadata like timestamp, level, logger, function, line, message,
exception, etc.; update the loop to either (a) skip any keys in a reserved set
(e.g., {"timestamp","level","logger","function","line","message","exception"})
or (b) nest all extras under a single key such as log_data["extra"] =
filtered_extras so downstream parsing is preserved—apply the change where extras
is read from record and ensure filtering still runs via filter_sensitive_data on
string values before nesting or assigning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 633f0f12-23c4-4c19-9f7c-01a3cb4aa764

📥 Commits

Reviewing files that changed from the base of the PR and between 575f705 and e44f5cb.

📒 Files selected for processing (1)
  • app/modules/utils/logger.py

Comment on lines 12 to 15
SENSITIVE_PATTERNS = [
# Credentials in key=value format
(
re.compile(r'(password|passwd|pwd)=["\']?([^"\'\s&]+)', re.IGNORECASE),
r"\1=***REDACTED***",
),
(
re.compile(
r'(token|access_token|refresh_token|id_token)=["\']?([^"\'\s&]+)',
re.IGNORECASE,
),
r"\1=***REDACTED***",
),
(
re.compile(
r'(secret|client_secret|api_secret)=["\']?([^"\'\s&]+)', re.IGNORECASE
),
r"\1=***REDACTED***",
),
(
re.compile(r'(api[_-]?key|apikey)=["\']?([^"\'\s&]+)', re.IGNORECASE),
r"\1=***REDACTED***",
),
(
re.compile(r'(auth|authorization)=["\']?([^"\'\s&]+)', re.IGNORECASE),
r"\1=***REDACTED***",
),
# Bearer tokens
(
re.compile(r"Bearer\s+([A-Za-z0-9\-._~+/]+=*)", re.IGNORECASE),
r"Bearer ***REDACTED***",
),
# Basic auth
(re.compile(r"Basic\s+([A-Za-z0-9+/]+=*)", re.IGNORECASE), r"Basic ***REDACTED***"),
# Redis/Database URLs with passwords
(
re.compile(
r"(redis|postgresql|mysql|mongodb)://([^:]+):([^@]+)@", re.IGNORECASE
),
r"\1://\2:***REDACTED***@",
),
# OAuth authorization codes (typically 20-100 chars alphanumeric)
(
re.compile(r"([?&]code=)([A-Za-z0-9\-._~]{20,100})([&\s]|$)", re.IGNORECASE),
r"\1***REDACTED***\3",
),
# Generic secrets in quotes
(
re.compile(
r'("(?:password|token|secret|api_key)"\s*:\s*)"([^"]+)"', re.IGNORECASE
),
r'\1"***REDACTED***"',
),
(
re.compile(
r"('(?:password|token|secret|api_key)'\s*:\s*)'([^']+)'", re.IGNORECASE
),
r"\1'***REDACTED***'",
),
(re.compile(rf'{_P_WORD}=[\'"]([^\'"]+)[\'"]'), f'{_P_WORD}=********'),
(re.compile(rf'{_A_KEY}=[\'"]([^\'"]+)[\'"]'), f'{_A_KEY}=********'),
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Structured secrets still leak through this sink.

On Line 69-75, only str/bytes extras go through filter_sensitive_data(), so payloads like extra={"password": "secret"} are emitted unchanged. On Line 12-15, the regexes only match password='...' / api_key='...', so even stringified JSON or dict-shaped values are not scrubbed reliably. That is a direct credentials-in-logs regression.

[suggested fix: sanitize recursively before serialization]

Proposed direction
+SENSITIVE_KEYS = {"password", "api_key"}
+
+def _sanitize_log_value(value):
+    if isinstance(value, dict):
+        return {
+            key: ("********" if key.lower() in SENSITIVE_KEYS else _sanitize_log_value(item))
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        return [_sanitize_log_value(item) for item in value]
+    if isinstance(value, (str, bytes)):
+        return filter_sensitive_data(str(value))
+    return value
+
 def production_log_sink(message):
@@
-            "value": filter_sensitive_data(str(exc.get("value", ""))),
+            "value": _sanitize_log_value(exc.get("value", "")),
             "traceback": filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)),
@@
-        if key != "name":
-            if isinstance(value, (str, bytes)):
-                log_data[key] = filter_sensitive_data(str(value))
-            else:
-                log_data[key] = value
+        if key != "name":
+            log_data[key] = _sanitize_log_value(value)

Also applies to: 55-57, 69-75

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 12 - 15, The logging sink currently
only applies SENSITIVE_PATTERNS to string blobs, allowing structured extras like
{"password": "secret"} to pass through; update filter_sensitive_data to accept
non-string inputs and recursively traverse dicts/lists/tuples to mask secret
values (keys matching the _P_WORD and _A_KEY identifiers and any other sensitive
keys) before any serialization, and ensure callers that previously only filtered
str/bytes extras call the updated filter_sensitive_data on extras (so
SENSITIVE_PATTERNS is still used for string regex redaction but structured
values are blanked/masked at the dict key level first).

Comment thread app/modules/utils/logger.py Outdated
Comment thread app/modules/utils/logger.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
app/modules/utils/logger.py (1)

67-77: ⚠️ Potential issue | 🟠 Major

Structured secrets in extras still leak through without redaction.

Non-string values (dicts, lists) in extras are passed through unfiltered on line 74. If a caller logs logger.bind(credentials={"password": "secret"}).info(...), the nested password will appear in the log output.

This was flagged in a previous review. Consider recursively sanitizing nested structures:

🛡️ Proposed recursive sanitization
+SENSITIVE_KEYS = {_P_WORD, _A_KEY, "secret", "token", "auth"}
+
+def _sanitize_value(value):
+    """Recursively sanitize sensitive data in nested structures."""
+    if isinstance(value, dict):
+        return {
+            k: "********" if k.lower() in SENSITIVE_KEYS else _sanitize_value(v)
+            for k, v in value.items()
+        }
+    if isinstance(value, list):
+        return [_sanitize_value(item) for item in value]
+    if isinstance(value, (str, bytes)):
+        return filter_sensitive_data(str(value))
+    return value
+
 raw_extras = record.get("extra", {})
 sanitized_extras = {}
 for key, value in raw_extras.items():
     if key != "name":
-        if isinstance(value, (str, bytes)):
-            sanitized_extras[key] = filter_sensitive_data(str(value))
-        else:
-            sanitized_extras[key] = value
+        if key.lower() in SENSITIVE_KEYS:
+            sanitized_extras[key] = "********"
+        else:
+            sanitized_extras[key] = _sanitize_value(value)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 67 - 77, The current extras
sanitization only filters string/bytes at top-level and passes through nested
dicts/lists, so update the logic around raw_extras -> sanitized_extras to
recursively traverse and sanitize nested structures: for any dict/list/tuple/set
encountered (referencing raw_extras and sanitized_extras), walk each
element/key/value and apply filter_sensitive_data to all str/bytes values (and
convert bytes to str first), preserving container types and non-string
primitives unchanged, still skipping the "name" key; finally assign the fully
sanitized_extras into log_data["extra"] as before.
🧹 Nitpick comments (3)
app/modules/utils/logger.py (3)

85-86: Move imports to the top of the file.

Imports are conventionally placed at the beginning of Python modules. Having them mid-file after function definitions can make dependencies harder to track.

♻️ Suggested reordering
 import json
 import re
 import sys
+from contextlib import contextmanager
+import logging
 
 # Define sensitive patterns for redaction

Then remove lines 85-86.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 85 - 86, Move the two import
statements "from contextlib import contextmanager" and "import logging" to the
top of the module (with the other imports) and remove the duplicate mid-file
declarations (the current lines containing those imports); update any imports
ordering to follow existing style and verify there are no circular import issues
caused by relocating them.

54-55: Consider removing the "FIX:" prefix from the comment.

The implementation is correct—truncation is applied before filtering. The "FIX:" prefix suggests this was a correction from a prior issue and can now be simplified to a regular explanatory comment.

-            # FIX: Apply truncation before sensitive data filtering
+            # Apply truncation before sensitive data filtering
             "traceback": filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 54 - 55, Remove the "FIX:" prefix
from the inline comment that precedes the "traceback" field so it reads as a
normal explanatory comment; update the comment near the "traceback":
filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)) line to
a plain description (e.g., "Apply truncation before sensitive data filtering")
without the "FIX:" token, leaving the logic using _truncate_traceback and
filter_sensitive_data unchanged.

10-13: Limited redaction patterns may miss common credential formats.

The patterns only match password='...' and api_key='...' with single or double quotes. Common variations like password=secret (unquoted), "password": "secret" (JSON format), or URL query strings ?password=secret& will not be redacted.

Consider expanding patterns to cover additional formats if credentials may appear in logs in other forms.

♻️ Example expanded patterns
 SENSITIVE_PATTERNS = [
     (re.compile(rf'{_P_WORD}=[\'"]([^\'"]+)[\'"]'), f'{_P_WORD}=********'),
     (re.compile(rf'{_A_KEY}=[\'"]([^\'"]+)[\'"]'), f'{_A_KEY}=********'),
+    # Unquoted values (word characters until whitespace/delimiter)
+    (re.compile(rf'{_P_WORD}=(\w+)'), f'{_P_WORD}=********'),
+    (re.compile(rf'{_A_KEY}=(\w+)'), f'{_A_KEY}=********'),
+    # JSON format
+    (re.compile(rf'"{_P_WORD}"\s*:\s*"([^"]+)"'), f'"{_P_WORD}": "********"'),
+    (re.compile(rf'"{_A_KEY}"\s*:\s*"([^"]+)"'), f'"{_A_KEY}": "********"'),
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 10 - 13, SENSITIVE_PATTERNS
currently only matches quoted forms of {_P_WORD} and {_A_KEY}; extend the
patterns in the SENSITIVE_PATTERNS list to also match unquoted assignments
(e.g., password=secret), JSON-style keys (e.g., "password": "secret" or
'password': 'secret'), and URL/query-style parameters (e.g., ?password=secret&),
using word boundaries and case-insensitive regex flags to avoid overmatching;
update the entries that reference _P_WORD and _A_KEY to include these
alternative regexes (and consider a generic key:value and key=value pattern) so
all common credential formats are redacted consistently while preserving the
existing quoted-match patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/modules/utils/logger.py`:
- Around line 96-99: The current log_context function is a no-op and must bind
the passed kwargs into the module-level logging context so request-scoped fields
appear in logs: implement log_context as a contextmanager that updates a
contextvars-backed store (use the existing get_log_context()/module contextvar
if present), merge the incoming kwargs into that store for the duration of the
context and restore the previous state on exit (so LoggingContextMiddleware's
with log_context(**context): actually binds request_id/path/method/user_id),
ensure kwargs is referenced to avoid Ruff ARG001, and update production_log_sink
to merge get_log_context() into sanitized_extras before emitting logs so the
bound fields are included in produced log records.
- Around line 88-94: The stubs configure_logging() and setup_logger() must be
implemented so callers receive a configured logger: update
configure_logging(level=...) to accept the level parameter, set the root logger
level, attach the production_log_sink handler (and any redaction/truncation
wrapper used elsewhere), and remove/replace default handlers so logs route
through the production sink; update setup_logger(name) to return
logging.getLogger(name) but ensure it inherits the configured handlers (i.e., do
not create a bare logger with no handlers or set propagate=False). Use the
existing production_log_sink and any redactor/truncator utilities when wiring
handlers so app/main.py and app/celery/celery_app.py emit through the production
pipeline.

---

Duplicate comments:
In `@app/modules/utils/logger.py`:
- Around line 67-77: The current extras sanitization only filters string/bytes
at top-level and passes through nested dicts/lists, so update the logic around
raw_extras -> sanitized_extras to recursively traverse and sanitize nested
structures: for any dict/list/tuple/set encountered (referencing raw_extras and
sanitized_extras), walk each element/key/value and apply filter_sensitive_data
to all str/bytes values (and convert bytes to str first), preserving container
types and non-string primitives unchanged, still skipping the "name" key;
finally assign the fully sanitized_extras into log_data["extra"] as before.

---

Nitpick comments:
In `@app/modules/utils/logger.py`:
- Around line 85-86: Move the two import statements "from contextlib import
contextmanager" and "import logging" to the top of the module (with the other
imports) and remove the duplicate mid-file declarations (the current lines
containing those imports); update any imports ordering to follow existing style
and verify there are no circular import issues caused by relocating them.
- Around line 54-55: Remove the "FIX:" prefix from the inline comment that
precedes the "traceback" field so it reads as a normal explanatory comment;
update the comment near the "traceback":
filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)) line to
a plain description (e.g., "Apply truncation before sensitive data filtering")
without the "FIX:" token, leaving the logic using _truncate_traceback and
filter_sensitive_data unchanged.
- Around line 10-13: SENSITIVE_PATTERNS currently only matches quoted forms of
{_P_WORD} and {_A_KEY}; extend the patterns in the SENSITIVE_PATTERNS list to
also match unquoted assignments (e.g., password=secret), JSON-style keys (e.g.,
"password": "secret" or 'password': 'secret'), and URL/query-style parameters
(e.g., ?password=secret&), using word boundaries and case-insensitive regex
flags to avoid overmatching; update the entries that reference _P_WORD and
_A_KEY to include these alternative regexes (and consider a generic key:value
and key=value pattern) so all common credential formats are redacted
consistently while preserving the existing quoted-match patterns.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d8943ed-c3f4-47f6-96e9-4ed8b2f43f38

📥 Commits

Reviewing files that changed from the base of the PR and between e44f5cb and 388f8fb.

📒 Files selected for processing (1)
  • app/modules/utils/logger.py

Comment thread app/modules/utils/logger.py Outdated
Comment thread app/modules/utils/logger.py Outdated
Comment on lines +96 to +99
@contextmanager
def log_context(**kwargs):
"""
Context manager to add domain IDs to all logs within the context.

Uses Loguru's native contextualize() which is thread-safe and async-safe.

Usage:
with log_context(conversation_id=conv_id, user_id=user_id):
logger.info("This log will include conversation_id and user_id")
logger.exception("Error occurred") # Also includes context
"""
with _logger.contextualize(**kwargs):
yield


def setup_logger(name: str):
"""
Setup a logger with the given name.

Standard practice: Use this for YOUR application code.
For third-party libraries, let the interception handle it.

For context, use log_context() context manager or add context as kwargs:
logger.info("Message", user_id=user_id, conversation_id=conv_id)
logger.exception("Error", user_id=user_id) # Includes stack trace + context
"""
if not _LOGGING_CONFIGURED:
configure_logging()

return _logger.bind(name=name)


def should_show_stack_trace() -> bool:
"""Check if stack traces should be shown in logs.

Controlled by LOG_STACK_TRACES environment variable.
Set LOG_STACK_TRACES=false to disable stack traces.

Usage in logging calls:
logger.warning("Error occurred", exc_info=should_show_stack_trace())
"""
return SHOW_STACK_TRACES


# Convenience function for dynamic level adjustment
def set_library_log_level(library_name: str, level: str):
"""
Dynamically adjust log level for a specific library.

Usage:
set_library_log_level("sqlalchemy.engine", "DEBUG") # Temporarily debug SQL
set_library_log_level("sqlalchemy.engine", "WARNING") # Back to normal
"""
lib_logger = logging.getLogger(library_name)
lib_logger.setLevel(level)
_logger.info(f"Set log level for '{library_name}' to {level}")
"""Stub to keep existing code working."""
yield No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

log_context stub breaks request tracing in middleware.

The LoggingContextMiddleware (see app/modules/utils/logging_middleware.py:53) uses with log_context(**context): to bind request_id, path, method, and user_id to all logs within a request scope. This stub yields without binding anything, so logs will lack these context fields.

This breaks correlation of logs by request, which is critical for production debugging.

🔧 Minimal implementation using contextvars
+import contextvars
+
+_log_context_var: contextvars.ContextVar[dict] = contextvars.ContextVar("log_context", default={})
+
+def get_log_context() -> dict:
+    """Retrieve current log context for use in handlers."""
+    return _log_context_var.get()
+
 `@contextmanager`
 def log_context(**kwargs):
-    """Stub to keep existing code working."""
-    yield
+    """Bind context variables to all logs within this scope."""
+    current = _log_context_var.get().copy()
+    current.update(kwargs)
+    token = _log_context_var.set(current)
+    try:
+        yield
+    finally:
+        _log_context_var.reset(token)

Then update production_log_sink to merge get_log_context() into sanitized_extras.

Additionally, Ruff flags kwargs as unused (ARG001), which is symptomatic of this issue.

🧰 Tools
🪛 Ruff (0.15.4)

[warning] 97-97: Unused function argument: kwargs

(ARG001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 96 - 99, The current log_context
function is a no-op and must bind the passed kwargs into the module-level
logging context so request-scoped fields appear in logs: implement log_context
as a contextmanager that updates a contextvars-backed store (use the existing
get_log_context()/module contextvar if present), merge the incoming kwargs into
that store for the duration of the context and restore the previous state on
exit (so LoggingContextMiddleware's with log_context(**context): actually binds
request_id/path/method/user_id), ensure kwargs is referenced to avoid Ruff
ARG001, and update production_log_sink to merge get_log_context() into
sanitized_extras before emitting logs so the bound fields are included in
produced log records.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
app/modules/utils/logger.py (3)

105-108: ⚠️ Potential issue | 🟠 Major

log_context() is still dropping request-scoped fields.

app/modules/utils/logging_middleware.py:33-60 expects this context manager to bind request_id, path, method, and user_id for every log in the request. Right now it just yields, so correlation data never reaches the sink.

Suggested direction
+import contextvars
+
+_log_context: contextvars.ContextVar[dict] = contextvars.ContextVar("log_context", default={})
+
+def get_log_context():
+    return _log_context.get()
+
 `@contextmanager`
 def log_context(**kwargs):
-    """Stub to keep existing code working."""
-    yield
+    current = get_log_context().copy()
+    current.update(kwargs)
+    token = _log_context.set(current)
+    try:
+        yield
+    finally:
+        _log_context.reset(token)

Then merge get_log_context() into sanitized_extras before writing log_data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 105 - 108, The current
log_context() context manager yields without binding request-scoped fields, so
request_id, path, method, and user_id never propagate to logs; update
log_context to push/attach the dict returned by get_log_context() into the
logging context (or threadlocal/structlog context) for the duration of the
context, and ensure the code that writes logs merges get_log_context() into
sanitized_extras before composing log_data so those fields are included (refer
to log_context, get_log_context, and sanitized_extras to locate and implement
the change).

83-97: ⚠️ Potential issue | 🟠 Major

Build a structured payload from LogRecord before calling the sink.

emit() currently passes self.format(record), but production_log_sink() only does the JSON/extras/traceback path after json.loads(...). With the default formatter this is just the plain message, so the sink falls back to raw stdout and skips the production formatting entirely.

Minimal fix
+import traceback
+
 class SinkHandler(logging.Handler):
     """Bridges standard logging to the production_log_sink."""
     def emit(self, record):
-        # We pass the formatted record string to the sink
-        production_log_sink(self.format(record))
+        payload = {
+            "record": {
+                "time": {"repr": record.created},
+                "level": {"name": record.levelname},
+                "name": record.name,
+                "function": record.funcName,
+                "line": record.lineno,
+                "message": record.getMessage(),
+                "exception": None,
+                "extra": {
+                    key: value
+                    for key, value in record.__dict__.items()
+                    if key not in logging.makeLogRecord({}).__dict__
+                },
+            }
+        }
+        if record.exc_info:
+            payload["record"]["exception"] = {
+                "type": {"name": record.exc_info[0].__name__},
+                "value": str(record.exc_info[1]),
+                "traceback": "".join(traceback.format_exception(*record.exc_info)),
+            }
+        production_log_sink(json.dumps(payload, default=str))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 83 - 97, The SinkHandler.emit
currently passes only self.format(record) which yields an unstructured string;
instead build a structured dict from the LogRecord in SinkHandler.emit (include
keys like timestamp (use record.created), level (record.levelname), logger
(record.name), message (record.getMessage()), pathname, lineno, funcName, and
any extras from record.__dict__), add formatted exception/traceback when
record.exc_info is present (use
logging.Formatter().formatException(record.exc_info)), then JSON-serialize that
payload and pass the JSON string to production_log_sink; update SinkHandler.emit
to construct this payload before calling production_log_sink (and leave
configure_logging unchanged).

11-23: ⚠️ Potential issue | 🟠 Major

Mask structured secrets before serializing extras.

Only top-level str/bytes extras are filtered here, and the regexes only catch password=... / api_key=.... Values like extra={"password": "secret"} or nested payloads still go out verbatim.

Suggested direction
+SENSITIVE_KEYS = {_P_WORD, _A_KEY}
+
+def _sanitize_log_value(value):
+    if isinstance(value, dict):
+        return {
+            key: (
+                "********"
+                if str(key).lower() in SENSITIVE_KEYS
+                else _sanitize_log_value(item)
+            )
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        return [_sanitize_log_value(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(_sanitize_log_value(item) for item in value)
+    if isinstance(value, bytes):
+        value = value.decode("utf-8", errors="replace")
+    if isinstance(value, str):
+        return filter_sensitive_data(value)
+    return value
+
     raw_extras = record.get("extra", {})
     sanitized_extras = {}
     for key, value in raw_extras.items():
         if key != "name":
-            sanitized_extras[key] = filter_sensitive_data(str(value)) if isinstance(value, (str, bytes)) else value
+            sanitized_extras[key] = _sanitize_log_value(value)

Also applies to: 67-74

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/utils/logger.py` around lines 11 - 23, The current
filter_sensitive_data only handles top-level strings with regexes
(SENSITIVE_PATTERNS) and misses structured/nested secrets in dicts or other
serializable extras; update filter_sensitive_data to recursively walk mappings
and iterables (handling dict, list/tuple, set) and mask values for keys matching
identifiers like _P_WORD and _A_KEY (and their variants) as well as apply regex
masking to string leaves, ensure bytes are decoded before filtering, and call
this recursive sanitizer before any serialization of extras (also update the
serialization path referenced around the extras handling currently at the other
occurrence) so nested payloads like {"password":"secret"} are masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@app/modules/utils/logger.py`:
- Around line 105-108: The current log_context() context manager yields without
binding request-scoped fields, so request_id, path, method, and user_id never
propagate to logs; update log_context to push/attach the dict returned by
get_log_context() into the logging context (or threadlocal/structlog context)
for the duration of the context, and ensure the code that writes logs merges
get_log_context() into sanitized_extras before composing log_data so those
fields are included (refer to log_context, get_log_context, and sanitized_extras
to locate and implement the change).
- Around line 83-97: The SinkHandler.emit currently passes only
self.format(record) which yields an unstructured string; instead build a
structured dict from the LogRecord in SinkHandler.emit (include keys like
timestamp (use record.created), level (record.levelname), logger (record.name),
message (record.getMessage()), pathname, lineno, funcName, and any extras from
record.__dict__), add formatted exception/traceback when record.exc_info is
present (use logging.Formatter().formatException(record.exc_info)), then
JSON-serialize that payload and pass the JSON string to production_log_sink;
update SinkHandler.emit to construct this payload before calling
production_log_sink (and leave configure_logging unchanged).
- Around line 11-23: The current filter_sensitive_data only handles top-level
strings with regexes (SENSITIVE_PATTERNS) and misses structured/nested secrets
in dicts or other serializable extras; update filter_sensitive_data to
recursively walk mappings and iterables (handling dict, list/tuple, set) and
mask values for keys matching identifiers like _P_WORD and _A_KEY (and their
variants) as well as apply regex masking to string leaves, ensure bytes are
decoded before filtering, and call this recursive sanitizer before any
serialization of extras (also update the serialization path referenced around
the extras handling currently at the other occurrence) so nested payloads like
{"password":"secret"} are masked.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 85f2c7c0-7c80-41bb-9254-dbc3ebce11bd

📥 Commits

Reviewing files that changed from the base of the PR and between 388f8fb and 2e2f1b4.

📒 Files selected for processing (1)
  • app/modules/utils/logger.py

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 9, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant