Fix: Truncate production log tracebacks to 10 lines#686
Fix: Truncate production log tracebacks to 10 lines#686hamzawritescode wants to merge 6 commits intopotpie-ai:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR flattens logging infrastructure: replaces interception-based handlers with lightweight compatibility shims, consolidates redaction to two private patterns, adds traceback truncation, sanitizes extras, and emits single-line redacted JSON logs via a simplified production sink. Changes
Sequence Diagram(s)(Skipped — changes are refactor/compatibility-focused and do not introduce a new multi-component sequential flow requiring visualization.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/utils/logger.py (1)
1-57:⚠️ Potential issue | 🔴 CriticalCritical: Removed exports break application startup.
The functions
configure_logging,setup_logger, andlog_contextare no longer defined inapp/modules/utils/logger.pybut are still imported by 100+ modules across the codebase, including critical startup paths:
Function Import Count Critical Locations configure_logging3 app/main.py:50,app/celery/celery_app.py:29setup_logger100+ Throughout codebase (intelligence, parsing, integrations, etc.) log_context5 app/modules/utils/logging_middleware.py:15,app/celery/tasks/parsing_tasks.py:7This will cause
ImportErroron application startup and prevent Celery workers from initializing.Either:
- Keep the removed functions (or provide stubs/replacements)
- Update all callers to use the new logging approach
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 1 - 57, The module removed three widely-imported symbols causing ImportError; restore compatibility by reintroducing configure_logging, setup_logger, and log_context in app/modules/utils/logger.py as thin wrappers or stubs that preserve the original public signatures and behavior but delegate to the new implementation (e.g., use production_log_sink, filter_sensitive_data, and existing log configuration helpers); ensure configure_logging still sets handlers and sinks expected by callers, setup_logger returns a logger-like object or configures the named logger, and log_context provides the previous context-manager or decorator interface so existing imports across the codebase continue to work without changing callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/modules/utils/logger.py`:
- Line 1: The module is missing imports for json and sys which are used by
functions like _truncate_traceback (and any code calling json.loads(),
json.dumps(), sys.stdout.write(), sys.stdout.flush()); add "import json" and
"import sys" at the top of the file before any function definitions so those
names are defined at runtime and avoid NameError when _truncate_traceback or
other logger utilities invoke json/sys functions.
- Around line 31-33: The code calls a now-removed filter_sensitive_data function
(used in building log payloads for "value", "traceback" and other fields) which
will raise NameError; either remove these calls and pass the raw values through
(e.g., replace filter_sensitive_data(x) with x) everywhere it is referenced, or
restore/import an appropriate implementation named filter_sensitive_data and
ensure it's applied before _truncate_traceback when used in the "traceback"
construction; update all occurrences (search for filter_sensitive_data in
logger.py, including the entries that build the error payload and any other log
formatting code) so they consistently follow the chosen approach.
---
Outside diff comments:
In `@app/modules/utils/logger.py`:
- Around line 1-57: The module removed three widely-imported symbols causing
ImportError; restore compatibility by reintroducing configure_logging,
setup_logger, and log_context in app/modules/utils/logger.py as thin wrappers or
stubs that preserve the original public signatures and behavior but delegate to
the new implementation (e.g., use production_log_sink, filter_sensitive_data,
and existing log configuration helpers); ensure configure_logging still sets
handlers and sinks expected by callers, setup_logger returns a logger-like
object or configures the named logger, and log_context provides the previous
context-manager or decorator interface so existing imports across the codebase
continue to work without changing callers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0e730f98-1d04-4a20-9792-a9a86fa7dabf
📒 Files selected for processing (1)
app/modules/utils/logger.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/utils/logger.py (1)
1-77:⚠️ Potential issue | 🔴 CriticalCritical: Removed exports break application startup.
This refactor removes
configure_logging,setup_logger, andlog_contextthat are actively imported and used throughout the codebase:
app/main.py:53andapp/celery/celery_app.py:37callconfigure_logging()at startupsetup_logger(__name__)is called in 100+ modules to create logger instanceslog_context()is used as a context manager in multiple modules (e.g.,app/parsing/graph_construction/parsing_service.py:124,app/celery/tasks/agent_tasks.py:97,app/conversations/conversations_router.py:199)Restore these functions or update all dependent modules to remove/replace these imports before merging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 1 - 77, The PR removed exported logging helpers causing startup failures; restore the three public APIs configure_logging(), setup_logger(name) and log_context(...) with their previous signatures (at least the ones used across the repo) inside app/modules/utils/logger.py, rewire configure_logging to configure the logger to use the existing production_log_sink, implement setup_logger to return a configured logger object (using production_log_sink and any formatters/levels previously expected), and reintroduce log_context as a contextmanager that injects extra context into log records; ensure the restored function names and signatures match existing imports so dependent modules (configure_logging, setup_logger, log_context) continue to work without changing callers.
🧹 Nitpick comments (3)
app/modules/utils/logger.py (3)
65-71: Minor: Nested objects in extras won't have sensitive data filtered.The current logic only filters
strandbytesvalues in extras. If an extra field contains a dict or list with sensitive data, it will pass through unfiltered.Consider recursive filtering for complex types if sensitive data might appear in nested structures:
♻️ Potential enhancement for nested filtering
def filter_value(value): if isinstance(value, str): return filter_sensitive_data(value) elif isinstance(value, bytes): return filter_sensitive_data(value.decode('utf-8', errors='replace')) elif isinstance(value, dict): return {k: filter_value(v) for k, v in value.items()} elif isinstance(value, list): return [filter_value(v) for v in value] return value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 65 - 71, The extras handling only filters str/bytes at the top level (in the loop handling record.get("extra")) so nested dicts/lists remain unfiltered; add a recursive helper (e.g., filter_value) referenced by the extras loop to walk dicts, lists/tuples (and decode bytes), calling filter_sensitive_data on strings and returning other types unchanged, then assign log_data[key] = filter_value(value) instead of the current branching so nested sensitive fields are sanitized; keep the helper next to the existing logic that uses record, extras, log_data and filter_sensitive_data.
7-11: Limited sensitive data patterns may leak secrets in logs.The current patterns only match
password='...'andapi_key='...'with specific quote formats. This misses many common sensitive data formats:
- Unquoted values:
password=secret123- JSON format:
"password": "secret123"- Other sensitive fields:
token,secret,authorization,credentials,bearer- Case variations:
PASSWORD,API_KEYConsider expanding the patterns for more comprehensive redaction:
♻️ Proposed expanded patterns
# Define sensitive patterns for redaction SENSITIVE_PATTERNS = [ - (re.compile(r'password=[\'"]([^\'"]+)[\'"]'), 'password=********'), - (re.compile(r'api_key=[\'"]([^\'"]+)[\'"]'), 'api_key=********'), + (re.compile(r'(password|passwd|pwd)\s*[=:]\s*[\'"]?[^\s\'"]+[\'"]?', re.IGNORECASE), r'\1=********'), + (re.compile(r'(api_key|apikey|api[-_]?secret)\s*[=:]\s*[\'"]?[^\s\'"]+[\'"]?', re.IGNORECASE), r'\1=********'), + (re.compile(r'(token|bearer|authorization|secret|credential)\s*[=:]\s*[\'"]?[^\s\'"]+[\'"]?', re.IGNORECASE), r'\1=********'), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 7 - 11, SENSITIVE_PATTERNS currently only matches quoted "password" and "api_key" forms and can miss many secret formats; update the SENSITIVE_PATTERNS constant to include additional regexes (with re.IGNORECASE) that cover unquoted assignments (e.g., password=secret123), JSON-style key/value pairs (e.g., "password": "secret123" or 'password':'secret123'), HTTP-style headers (e.g., Authorization: Bearer abc...), and additional keys like token, secret, authorization, credentials, bearer, api_key and their case variants; ensure each pattern uses word boundaries and replaces only the value portion with a fixed redaction token (e.g., ********) so existing redaction code that iterates over SENSITIVE_PATTERNS continues to work unchanged.
1-5: Minor: Remove unusedloggingimport.The
loggingmodule is imported on line 2 but never used in this file.♻️ Proposed fix
import json -import logging import os import re import sys🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 1 - 5, Remove the unused top-level import "logging" from the imports block (the import line "import logging") in this module; verify no other references to the logging symbol exist in functions or classes in logger.py and run lint/type-check to ensure no residual usage remains before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@app/modules/utils/logger.py`:
- Around line 1-77: The PR removed exported logging helpers causing startup
failures; restore the three public APIs configure_logging(), setup_logger(name)
and log_context(...) with their previous signatures (at least the ones used
across the repo) inside app/modules/utils/logger.py, rewire configure_logging to
configure the logger to use the existing production_log_sink, implement
setup_logger to return a configured logger object (using production_log_sink and
any formatters/levels previously expected), and reintroduce log_context as a
contextmanager that injects extra context into log records; ensure the restored
function names and signatures match existing imports so dependent modules
(configure_logging, setup_logger, log_context) continue to work without changing
callers.
---
Nitpick comments:
In `@app/modules/utils/logger.py`:
- Around line 65-71: The extras handling only filters str/bytes at the top level
(in the loop handling record.get("extra")) so nested dicts/lists remain
unfiltered; add a recursive helper (e.g., filter_value) referenced by the extras
loop to walk dicts, lists/tuples (and decode bytes), calling
filter_sensitive_data on strings and returning other types unchanged, then
assign log_data[key] = filter_value(value) instead of the current branching so
nested sensitive fields are sanitized; keep the helper next to the existing
logic that uses record, extras, log_data and filter_sensitive_data.
- Around line 7-11: SENSITIVE_PATTERNS currently only matches quoted "password"
and "api_key" forms and can miss many secret formats; update the
SENSITIVE_PATTERNS constant to include additional regexes (with re.IGNORECASE)
that cover unquoted assignments (e.g., password=secret123), JSON-style key/value
pairs (e.g., "password": "secret123" or 'password':'secret123'), HTTP-style
headers (e.g., Authorization: Bearer abc...), and additional keys like token,
secret, authorization, credentials, bearer, api_key and their case variants;
ensure each pattern uses word boundaries and replaces only the value portion
with a fixed redaction token (e.g., ********) so existing redaction code that
iterates over SENSITIVE_PATTERNS continues to work unchanged.
- Around line 1-5: Remove the unused top-level import "logging" from the imports
block (the import line "import logging") in this module; verify no other
references to the logging symbol exist in functions or classes in logger.py and
run lint/type-check to ensure no residual usage remains before committing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 94a08a63-0a48-4ecb-9f02-9367eae19b48
📒 Files selected for processing (1)
app/modules/utils/logger.py
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/modules/utils/logger.py`:
- Around line 26-36: The refactor removed the public logger API
(configure_logging, setup_logger, log_context) but callers still import/use
these functions; restore those functions (or provide backward-compatible
wrappers) in this module so app/main.py, app/celery/celery_app.py,
app/modules/utils/logging_middleware.py, and app/modules/utils/install_gvisor.py
keep working. Specifically, reintroduce configure_logging(...) and
setup_logger(...) that internally use the new
production_log_sink/_truncate_traceback helpers (or forward to the new
implementation), and provide a log_context contextmanager/function that yields
the same interface expected by callers; alternatively, change all callers to use
the new API consistently but ensure this PR updates app/main, celery_app,
logging_middleware, and install_gvisor together to avoid import/runtime errors.
- Around line 12-15: The logging sink currently only applies SENSITIVE_PATTERNS
to string blobs, allowing structured extras like {"password": "secret"} to pass
through; update filter_sensitive_data to accept non-string inputs and
recursively traverse dicts/lists/tuples to mask secret values (keys matching the
_P_WORD and _A_KEY identifiers and any other sensitive keys) before any
serialization, and ensure callers that previously only filtered str/bytes extras
call the updated filter_sensitive_data on extras (so SENSITIVE_PATTERNS is still
used for string regex redaction but structured values are blanked/masked at the
dict key level first).
- Around line 69-75: The extras flattening loop (using variables extras, record,
log_data and filter_sensitive_data) must not allow bound extras to overwrite
canonical metadata like timestamp, level, logger, function, line, message,
exception, etc.; update the loop to either (a) skip any keys in a reserved set
(e.g., {"timestamp","level","logger","function","line","message","exception"})
or (b) nest all extras under a single key such as log_data["extra"] =
filtered_extras so downstream parsing is preserved—apply the change where extras
is read from record and ensure filtering still runs via filter_sensitive_data on
string values before nesting or assigning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 633f0f12-23c4-4c19-9f7c-01a3cb4aa764
📒 Files selected for processing (1)
app/modules/utils/logger.py
| SENSITIVE_PATTERNS = [ | ||
| # Credentials in key=value format | ||
| ( | ||
| re.compile(r'(password|passwd|pwd)=["\']?([^"\'\s&]+)', re.IGNORECASE), | ||
| r"\1=***REDACTED***", | ||
| ), | ||
| ( | ||
| re.compile( | ||
| r'(token|access_token|refresh_token|id_token)=["\']?([^"\'\s&]+)', | ||
| re.IGNORECASE, | ||
| ), | ||
| r"\1=***REDACTED***", | ||
| ), | ||
| ( | ||
| re.compile( | ||
| r'(secret|client_secret|api_secret)=["\']?([^"\'\s&]+)', re.IGNORECASE | ||
| ), | ||
| r"\1=***REDACTED***", | ||
| ), | ||
| ( | ||
| re.compile(r'(api[_-]?key|apikey)=["\']?([^"\'\s&]+)', re.IGNORECASE), | ||
| r"\1=***REDACTED***", | ||
| ), | ||
| ( | ||
| re.compile(r'(auth|authorization)=["\']?([^"\'\s&]+)', re.IGNORECASE), | ||
| r"\1=***REDACTED***", | ||
| ), | ||
| # Bearer tokens | ||
| ( | ||
| re.compile(r"Bearer\s+([A-Za-z0-9\-._~+/]+=*)", re.IGNORECASE), | ||
| r"Bearer ***REDACTED***", | ||
| ), | ||
| # Basic auth | ||
| (re.compile(r"Basic\s+([A-Za-z0-9+/]+=*)", re.IGNORECASE), r"Basic ***REDACTED***"), | ||
| # Redis/Database URLs with passwords | ||
| ( | ||
| re.compile( | ||
| r"(redis|postgresql|mysql|mongodb)://([^:]+):([^@]+)@", re.IGNORECASE | ||
| ), | ||
| r"\1://\2:***REDACTED***@", | ||
| ), | ||
| # OAuth authorization codes (typically 20-100 chars alphanumeric) | ||
| ( | ||
| re.compile(r"([?&]code=)([A-Za-z0-9\-._~]{20,100})([&\s]|$)", re.IGNORECASE), | ||
| r"\1***REDACTED***\3", | ||
| ), | ||
| # Generic secrets in quotes | ||
| ( | ||
| re.compile( | ||
| r'("(?:password|token|secret|api_key)"\s*:\s*)"([^"]+)"', re.IGNORECASE | ||
| ), | ||
| r'\1"***REDACTED***"', | ||
| ), | ||
| ( | ||
| re.compile( | ||
| r"('(?:password|token|secret|api_key)'\s*:\s*)'([^']+)'", re.IGNORECASE | ||
| ), | ||
| r"\1'***REDACTED***'", | ||
| ), | ||
| (re.compile(rf'{_P_WORD}=[\'"]([^\'"]+)[\'"]'), f'{_P_WORD}=********'), | ||
| (re.compile(rf'{_A_KEY}=[\'"]([^\'"]+)[\'"]'), f'{_A_KEY}=********'), | ||
| ] |
There was a problem hiding this comment.
Structured secrets still leak through this sink.
On Line 69-75, only str/bytes extras go through filter_sensitive_data(), so payloads like extra={"password": "secret"} are emitted unchanged. On Line 12-15, the regexes only match password='...' / api_key='...', so even stringified JSON or dict-shaped values are not scrubbed reliably. That is a direct credentials-in-logs regression.
[suggested fix: sanitize recursively before serialization]
Proposed direction
+SENSITIVE_KEYS = {"password", "api_key"}
+
+def _sanitize_log_value(value):
+ if isinstance(value, dict):
+ return {
+ key: ("********" if key.lower() in SENSITIVE_KEYS else _sanitize_log_value(item))
+ for key, item in value.items()
+ }
+ if isinstance(value, list):
+ return [_sanitize_log_value(item) for item in value]
+ if isinstance(value, (str, bytes)):
+ return filter_sensitive_data(str(value))
+ return value
+
def production_log_sink(message):
@@
- "value": filter_sensitive_data(str(exc.get("value", ""))),
+ "value": _sanitize_log_value(exc.get("value", "")),
"traceback": filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)),
@@
- if key != "name":
- if isinstance(value, (str, bytes)):
- log_data[key] = filter_sensitive_data(str(value))
- else:
- log_data[key] = value
+ if key != "name":
+ log_data[key] = _sanitize_log_value(value)Also applies to: 55-57, 69-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/modules/utils/logger.py` around lines 12 - 15, The logging sink currently
only applies SENSITIVE_PATTERNS to string blobs, allowing structured extras like
{"password": "secret"} to pass through; update filter_sensitive_data to accept
non-string inputs and recursively traverse dicts/lists/tuples to mask secret
values (keys matching the _P_WORD and _A_KEY identifiers and any other sensitive
keys) before any serialization, and ensure callers that previously only filtered
str/bytes extras call the updated filter_sensitive_data on extras (so
SENSITIVE_PATTERNS is still used for string regex redaction but structured
values are blanked/masked at the dict key level first).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
app/modules/utils/logger.py (1)
67-77:⚠️ Potential issue | 🟠 MajorStructured secrets in extras still leak through without redaction.
Non-string values (dicts, lists) in extras are passed through unfiltered on line 74. If a caller logs
logger.bind(credentials={"password": "secret"}).info(...), the nested password will appear in the log output.This was flagged in a previous review. Consider recursively sanitizing nested structures:
🛡️ Proposed recursive sanitization
+SENSITIVE_KEYS = {_P_WORD, _A_KEY, "secret", "token", "auth"} + +def _sanitize_value(value): + """Recursively sanitize sensitive data in nested structures.""" + if isinstance(value, dict): + return { + k: "********" if k.lower() in SENSITIVE_KEYS else _sanitize_value(v) + for k, v in value.items() + } + if isinstance(value, list): + return [_sanitize_value(item) for item in value] + if isinstance(value, (str, bytes)): + return filter_sensitive_data(str(value)) + return value + raw_extras = record.get("extra", {}) sanitized_extras = {} for key, value in raw_extras.items(): if key != "name": - if isinstance(value, (str, bytes)): - sanitized_extras[key] = filter_sensitive_data(str(value)) - else: - sanitized_extras[key] = value + if key.lower() in SENSITIVE_KEYS: + sanitized_extras[key] = "********" + else: + sanitized_extras[key] = _sanitize_value(value)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 67 - 77, The current extras sanitization only filters string/bytes at top-level and passes through nested dicts/lists, so update the logic around raw_extras -> sanitized_extras to recursively traverse and sanitize nested structures: for any dict/list/tuple/set encountered (referencing raw_extras and sanitized_extras), walk each element/key/value and apply filter_sensitive_data to all str/bytes values (and convert bytes to str first), preserving container types and non-string primitives unchanged, still skipping the "name" key; finally assign the fully sanitized_extras into log_data["extra"] as before.
🧹 Nitpick comments (3)
app/modules/utils/logger.py (3)
85-86: Move imports to the top of the file.Imports are conventionally placed at the beginning of Python modules. Having them mid-file after function definitions can make dependencies harder to track.
♻️ Suggested reordering
import json import re import sys +from contextlib import contextmanager +import logging # Define sensitive patterns for redactionThen remove lines 85-86.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 85 - 86, Move the two import statements "from contextlib import contextmanager" and "import logging" to the top of the module (with the other imports) and remove the duplicate mid-file declarations (the current lines containing those imports); update any imports ordering to follow existing style and verify there are no circular import issues caused by relocating them.
54-55: Consider removing the "FIX:" prefix from the comment.The implementation is correct—truncation is applied before filtering. The "FIX:" prefix suggests this was a correction from a prior issue and can now be simplified to a regular explanatory comment.
- # FIX: Apply truncation before sensitive data filtering + # Apply truncation before sensitive data filtering "traceback": filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 54 - 55, Remove the "FIX:" prefix from the inline comment that precedes the "traceback" field so it reads as a normal explanatory comment; update the comment near the "traceback": filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)) line to a plain description (e.g., "Apply truncation before sensitive data filtering") without the "FIX:" token, leaving the logic using _truncate_traceback and filter_sensitive_data unchanged.
10-13: Limited redaction patterns may miss common credential formats.The patterns only match
password='...'andapi_key='...'with single or double quotes. Common variations likepassword=secret(unquoted),"password": "secret"(JSON format), or URL query strings?password=secret&will not be redacted.Consider expanding patterns to cover additional formats if credentials may appear in logs in other forms.
♻️ Example expanded patterns
SENSITIVE_PATTERNS = [ (re.compile(rf'{_P_WORD}=[\'"]([^\'"]+)[\'"]'), f'{_P_WORD}=********'), (re.compile(rf'{_A_KEY}=[\'"]([^\'"]+)[\'"]'), f'{_A_KEY}=********'), + # Unquoted values (word characters until whitespace/delimiter) + (re.compile(rf'{_P_WORD}=(\w+)'), f'{_P_WORD}=********'), + (re.compile(rf'{_A_KEY}=(\w+)'), f'{_A_KEY}=********'), + # JSON format + (re.compile(rf'"{_P_WORD}"\s*:\s*"([^"]+)"'), f'"{_P_WORD}": "********"'), + (re.compile(rf'"{_A_KEY}"\s*:\s*"([^"]+)"'), f'"{_A_KEY}": "********"'), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 10 - 13, SENSITIVE_PATTERNS currently only matches quoted forms of {_P_WORD} and {_A_KEY}; extend the patterns in the SENSITIVE_PATTERNS list to also match unquoted assignments (e.g., password=secret), JSON-style keys (e.g., "password": "secret" or 'password': 'secret'), and URL/query-style parameters (e.g., ?password=secret&), using word boundaries and case-insensitive regex flags to avoid overmatching; update the entries that reference _P_WORD and _A_KEY to include these alternative regexes (and consider a generic key:value and key=value pattern) so all common credential formats are redacted consistently while preserving the existing quoted-match patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/modules/utils/logger.py`:
- Around line 96-99: The current log_context function is a no-op and must bind
the passed kwargs into the module-level logging context so request-scoped fields
appear in logs: implement log_context as a contextmanager that updates a
contextvars-backed store (use the existing get_log_context()/module contextvar
if present), merge the incoming kwargs into that store for the duration of the
context and restore the previous state on exit (so LoggingContextMiddleware's
with log_context(**context): actually binds request_id/path/method/user_id),
ensure kwargs is referenced to avoid Ruff ARG001, and update production_log_sink
to merge get_log_context() into sanitized_extras before emitting logs so the
bound fields are included in produced log records.
- Around line 88-94: The stubs configure_logging() and setup_logger() must be
implemented so callers receive a configured logger: update
configure_logging(level=...) to accept the level parameter, set the root logger
level, attach the production_log_sink handler (and any redaction/truncation
wrapper used elsewhere), and remove/replace default handlers so logs route
through the production sink; update setup_logger(name) to return
logging.getLogger(name) but ensure it inherits the configured handlers (i.e., do
not create a bare logger with no handlers or set propagate=False). Use the
existing production_log_sink and any redactor/truncator utilities when wiring
handlers so app/main.py and app/celery/celery_app.py emit through the production
pipeline.
---
Duplicate comments:
In `@app/modules/utils/logger.py`:
- Around line 67-77: The current extras sanitization only filters string/bytes
at top-level and passes through nested dicts/lists, so update the logic around
raw_extras -> sanitized_extras to recursively traverse and sanitize nested
structures: for any dict/list/tuple/set encountered (referencing raw_extras and
sanitized_extras), walk each element/key/value and apply filter_sensitive_data
to all str/bytes values (and convert bytes to str first), preserving container
types and non-string primitives unchanged, still skipping the "name" key;
finally assign the fully sanitized_extras into log_data["extra"] as before.
---
Nitpick comments:
In `@app/modules/utils/logger.py`:
- Around line 85-86: Move the two import statements "from contextlib import
contextmanager" and "import logging" to the top of the module (with the other
imports) and remove the duplicate mid-file declarations (the current lines
containing those imports); update any imports ordering to follow existing style
and verify there are no circular import issues caused by relocating them.
- Around line 54-55: Remove the "FIX:" prefix from the inline comment that
precedes the "traceback" field so it reads as a normal explanatory comment;
update the comment near the "traceback":
filter_sensitive_data(_truncate_traceback(raw_traceback, num_lines=10)) line to
a plain description (e.g., "Apply truncation before sensitive data filtering")
without the "FIX:" token, leaving the logic using _truncate_traceback and
filter_sensitive_data unchanged.
- Around line 10-13: SENSITIVE_PATTERNS currently only matches quoted forms of
{_P_WORD} and {_A_KEY}; extend the patterns in the SENSITIVE_PATTERNS list to
also match unquoted assignments (e.g., password=secret), JSON-style keys (e.g.,
"password": "secret" or 'password': 'secret'), and URL/query-style parameters
(e.g., ?password=secret&), using word boundaries and case-insensitive regex
flags to avoid overmatching; update the entries that reference _P_WORD and
_A_KEY to include these alternative regexes (and consider a generic key:value
and key=value pattern) so all common credential formats are redacted
consistently while preserving the existing quoted-match patterns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d8943ed-c3f4-47f6-96e9-4ed8b2f43f38
📒 Files selected for processing (1)
app/modules/utils/logger.py
| @contextmanager | ||
| def log_context(**kwargs): | ||
| """ | ||
| Context manager to add domain IDs to all logs within the context. | ||
|
|
||
| Uses Loguru's native contextualize() which is thread-safe and async-safe. | ||
|
|
||
| Usage: | ||
| with log_context(conversation_id=conv_id, user_id=user_id): | ||
| logger.info("This log will include conversation_id and user_id") | ||
| logger.exception("Error occurred") # Also includes context | ||
| """ | ||
| with _logger.contextualize(**kwargs): | ||
| yield | ||
|
|
||
|
|
||
| def setup_logger(name: str): | ||
| """ | ||
| Setup a logger with the given name. | ||
|
|
||
| Standard practice: Use this for YOUR application code. | ||
| For third-party libraries, let the interception handle it. | ||
|
|
||
| For context, use log_context() context manager or add context as kwargs: | ||
| logger.info("Message", user_id=user_id, conversation_id=conv_id) | ||
| logger.exception("Error", user_id=user_id) # Includes stack trace + context | ||
| """ | ||
| if not _LOGGING_CONFIGURED: | ||
| configure_logging() | ||
|
|
||
| return _logger.bind(name=name) | ||
|
|
||
|
|
||
| def should_show_stack_trace() -> bool: | ||
| """Check if stack traces should be shown in logs. | ||
|
|
||
| Controlled by LOG_STACK_TRACES environment variable. | ||
| Set LOG_STACK_TRACES=false to disable stack traces. | ||
|
|
||
| Usage in logging calls: | ||
| logger.warning("Error occurred", exc_info=should_show_stack_trace()) | ||
| """ | ||
| return SHOW_STACK_TRACES | ||
|
|
||
|
|
||
| # Convenience function for dynamic level adjustment | ||
| def set_library_log_level(library_name: str, level: str): | ||
| """ | ||
| Dynamically adjust log level for a specific library. | ||
|
|
||
| Usage: | ||
| set_library_log_level("sqlalchemy.engine", "DEBUG") # Temporarily debug SQL | ||
| set_library_log_level("sqlalchemy.engine", "WARNING") # Back to normal | ||
| """ | ||
| lib_logger = logging.getLogger(library_name) | ||
| lib_logger.setLevel(level) | ||
| _logger.info(f"Set log level for '{library_name}' to {level}") | ||
| """Stub to keep existing code working.""" | ||
| yield No newline at end of file |
There was a problem hiding this comment.
log_context stub breaks request tracing in middleware.
The LoggingContextMiddleware (see app/modules/utils/logging_middleware.py:53) uses with log_context(**context): to bind request_id, path, method, and user_id to all logs within a request scope. This stub yields without binding anything, so logs will lack these context fields.
This breaks correlation of logs by request, which is critical for production debugging.
🔧 Minimal implementation using contextvars
+import contextvars
+
+_log_context_var: contextvars.ContextVar[dict] = contextvars.ContextVar("log_context", default={})
+
+def get_log_context() -> dict:
+ """Retrieve current log context for use in handlers."""
+ return _log_context_var.get()
+
`@contextmanager`
def log_context(**kwargs):
- """Stub to keep existing code working."""
- yield
+ """Bind context variables to all logs within this scope."""
+ current = _log_context_var.get().copy()
+ current.update(kwargs)
+ token = _log_context_var.set(current)
+ try:
+ yield
+ finally:
+ _log_context_var.reset(token)Then update production_log_sink to merge get_log_context() into sanitized_extras.
Additionally, Ruff flags kwargs as unused (ARG001), which is symptomatic of this issue.
🧰 Tools
🪛 Ruff (0.15.4)
[warning] 97-97: Unused function argument: kwargs
(ARG001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/modules/utils/logger.py` around lines 96 - 99, The current log_context
function is a no-op and must bind the passed kwargs into the module-level
logging context so request-scoped fields appear in logs: implement log_context
as a contextmanager that updates a contextvars-backed store (use the existing
get_log_context()/module contextvar if present), merge the incoming kwargs into
that store for the duration of the context and restore the previous state on
exit (so LoggingContextMiddleware's with log_context(**context): actually binds
request_id/path/method/user_id), ensure kwargs is referenced to avoid Ruff
ARG001, and update production_log_sink to merge get_log_context() into
sanitized_extras before emitting logs so the bound fields are included in
produced log records.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
app/modules/utils/logger.py (3)
105-108:⚠️ Potential issue | 🟠 Major
log_context()is still dropping request-scoped fields.
app/modules/utils/logging_middleware.py:33-60expects this context manager to bindrequest_id,path,method, anduser_idfor every log in the request. Right now it just yields, so correlation data never reaches the sink.Suggested direction
+import contextvars + +_log_context: contextvars.ContextVar[dict] = contextvars.ContextVar("log_context", default={}) + +def get_log_context(): + return _log_context.get() + `@contextmanager` def log_context(**kwargs): - """Stub to keep existing code working.""" - yield + current = get_log_context().copy() + current.update(kwargs) + token = _log_context.set(current) + try: + yield + finally: + _log_context.reset(token)Then merge
get_log_context()intosanitized_extrasbefore writinglog_data.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 105 - 108, The current log_context() context manager yields without binding request-scoped fields, so request_id, path, method, and user_id never propagate to logs; update log_context to push/attach the dict returned by get_log_context() into the logging context (or threadlocal/structlog context) for the duration of the context, and ensure the code that writes logs merges get_log_context() into sanitized_extras before composing log_data so those fields are included (refer to log_context, get_log_context, and sanitized_extras to locate and implement the change).
83-97:⚠️ Potential issue | 🟠 MajorBuild a structured payload from
LogRecordbefore calling the sink.
emit()currently passesself.format(record), butproduction_log_sink()only does the JSON/extras/traceback path afterjson.loads(...). With the default formatter this is just the plain message, so the sink falls back to raw stdout and skips the production formatting entirely.Minimal fix
+import traceback + class SinkHandler(logging.Handler): """Bridges standard logging to the production_log_sink.""" def emit(self, record): - # We pass the formatted record string to the sink - production_log_sink(self.format(record)) + payload = { + "record": { + "time": {"repr": record.created}, + "level": {"name": record.levelname}, + "name": record.name, + "function": record.funcName, + "line": record.lineno, + "message": record.getMessage(), + "exception": None, + "extra": { + key: value + for key, value in record.__dict__.items() + if key not in logging.makeLogRecord({}).__dict__ + }, + } + } + if record.exc_info: + payload["record"]["exception"] = { + "type": {"name": record.exc_info[0].__name__}, + "value": str(record.exc_info[1]), + "traceback": "".join(traceback.format_exception(*record.exc_info)), + } + production_log_sink(json.dumps(payload, default=str))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 83 - 97, The SinkHandler.emit currently passes only self.format(record) which yields an unstructured string; instead build a structured dict from the LogRecord in SinkHandler.emit (include keys like timestamp (use record.created), level (record.levelname), logger (record.name), message (record.getMessage()), pathname, lineno, funcName, and any extras from record.__dict__), add formatted exception/traceback when record.exc_info is present (use logging.Formatter().formatException(record.exc_info)), then JSON-serialize that payload and pass the JSON string to production_log_sink; update SinkHandler.emit to construct this payload before calling production_log_sink (and leave configure_logging unchanged).
11-23:⚠️ Potential issue | 🟠 MajorMask structured secrets before serializing extras.
Only top-level
str/bytesextras are filtered here, and the regexes only catchpassword=.../api_key=.... Values likeextra={"password": "secret"}or nested payloads still go out verbatim.Suggested direction
+SENSITIVE_KEYS = {_P_WORD, _A_KEY} + +def _sanitize_log_value(value): + if isinstance(value, dict): + return { + key: ( + "********" + if str(key).lower() in SENSITIVE_KEYS + else _sanitize_log_value(item) + ) + for key, item in value.items() + } + if isinstance(value, list): + return [_sanitize_log_value(item) for item in value] + if isinstance(value, tuple): + return tuple(_sanitize_log_value(item) for item in value) + if isinstance(value, bytes): + value = value.decode("utf-8", errors="replace") + if isinstance(value, str): + return filter_sensitive_data(value) + return value + raw_extras = record.get("extra", {}) sanitized_extras = {} for key, value in raw_extras.items(): if key != "name": - sanitized_extras[key] = filter_sensitive_data(str(value)) if isinstance(value, (str, bytes)) else value + sanitized_extras[key] = _sanitize_log_value(value)Also applies to: 67-74
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/utils/logger.py` around lines 11 - 23, The current filter_sensitive_data only handles top-level strings with regexes (SENSITIVE_PATTERNS) and misses structured/nested secrets in dicts or other serializable extras; update filter_sensitive_data to recursively walk mappings and iterables (handling dict, list/tuple, set) and mask values for keys matching identifiers like _P_WORD and _A_KEY (and their variants) as well as apply regex masking to string leaves, ensure bytes are decoded before filtering, and call this recursive sanitizer before any serialization of extras (also update the serialization path referenced around the extras handling currently at the other occurrence) so nested payloads like {"password":"secret"} are masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@app/modules/utils/logger.py`:
- Around line 105-108: The current log_context() context manager yields without
binding request-scoped fields, so request_id, path, method, and user_id never
propagate to logs; update log_context to push/attach the dict returned by
get_log_context() into the logging context (or threadlocal/structlog context)
for the duration of the context, and ensure the code that writes logs merges
get_log_context() into sanitized_extras before composing log_data so those
fields are included (refer to log_context, get_log_context, and sanitized_extras
to locate and implement the change).
- Around line 83-97: The SinkHandler.emit currently passes only
self.format(record) which yields an unstructured string; instead build a
structured dict from the LogRecord in SinkHandler.emit (include keys like
timestamp (use record.created), level (record.levelname), logger (record.name),
message (record.getMessage()), pathname, lineno, funcName, and any extras from
record.__dict__), add formatted exception/traceback when record.exc_info is
present (use logging.Formatter().formatException(record.exc_info)), then
JSON-serialize that payload and pass the JSON string to production_log_sink;
update SinkHandler.emit to construct this payload before calling
production_log_sink (and leave configure_logging unchanged).
- Around line 11-23: The current filter_sensitive_data only handles top-level
strings with regexes (SENSITIVE_PATTERNS) and misses structured/nested secrets
in dicts or other serializable extras; update filter_sensitive_data to
recursively walk mappings and iterables (handling dict, list/tuple, set) and
mask values for keys matching identifiers like _P_WORD and _A_KEY (and their
variants) as well as apply regex masking to string leaves, ensure bytes are
decoded before filtering, and call this recursive sanitizer before any
serialization of extras (also update the serialization path referenced around
the extras handling currently at the other occurrence) so nested payloads like
{"password":"secret"} are masked.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 85f2c7c0-7c80-41bb-9254-dbc3ebce11bd
📒 Files selected for processing (1)
app/modules/utils/logger.py
|



This pull request significantly refactors the
app/modules/utils/logger.pymodule, focusing on simplifying the codebase, removing custom logging configuration utilities, and improving the handling of tracebacks in production log sinks. The changes streamline the logger implementation by removing advanced features such as sensitive data filtering, context management, and dynamic log level adjustment, leaving only the essentials for production log formatting.Logger codebase simplification and removal of utilities:
configure_logging,InterceptHandler,log_context,setup_logger,should_show_stack_trace, andset_library_log_levelfunctions, as well as related environment variable handling and sensitive data filtering logic. This eliminates complex configuration and context management from the logger module.filter_sensitive_datafunction, so log messages and exceptions are no longer filtered for secrets or credentials.Production log sink improvements:
_truncate_tracebackto truncate exception tracebacks to the last 10 lines in the production log sink, making logs more concise and focused on the root cause.production_log_sinkfunction to use the new traceback truncation helper and removed sensitive data filtering from exception tracebacks and messages.production_log_sinkby removing filtering and flattening logic, now directly adding extra fields to the log output.Summary by CodeRabbit