Skip to content

fix: production safety audit — 4 critical bugs across backend, server, and electron#1290

Open
eren-karakus0 wants to merge 2 commits intoeigent-ai:mainfrom
eren-karakus0:fix/production-safety-audit-critical-bugs
Open

fix: production safety audit — 4 critical bugs across backend, server, and electron#1290
eren-karakus0 wants to merge 2 commits intoeigent-ai:mainfrom
eren-karakus0:fix/production-safety-audit-critical-bugs

Conversation

@eren-karakus0
Copy link

Summary

A cross-stack production safety audit uncovering 4 confirmed bugs across the Python backend, server auth, and Electron main process. Each bug is independently verified and includes test coverage where applicable.

Bug 1: Electron Child Windows Missing Context Isolation (CRITICAL)

File: electron/main/index.tsopen-win IPC handler

The main window correctly sets contextIsolation: true, but the open-win handler creates child BrowserWindow instances with nodeIntegration: true and contextIsolation: false. With context isolation disabled, any JavaScript running in a child renderer window shares the same context as Electron's built-in modules — meaning any XSS payload or injected script gains full require('child_process').exec(...) access with no sandbox barrier.

Before:

webPreferences: {
  preload,
  nodeIntegration: true,      // full Node.js access
  contextIsolation: false,    // no sandbox
}

After:

webPreferences: {
  preload,
  nodeIntegration: false,     // no direct Node.js access
  contextIsolation: true,     // isolated contexts
  sandbox: true,              // OS-level sandboxing
}

Bug 2: JWT Token Expiry Returns Wrong Error Code (HIGH)

File: server/app/component/auth.pyAuth.decode_token()

PyJWT's jwt.decode() already validates the exp claim and raises ExpiredSignatureError (a subclass of InvalidTokenError) for expired tokens. The manual payload["exp"] < datetime.now() check never executes because expired tokens don't reach that line. The except InvalidTokenError block catches both expired and invalid tokens, raising token_invalid for both cases.

This makes it impossible for clients to distinguish "token expired, please refresh" from "token is corrupted, please re-login", breaking any automatic token refresh logic.

Before:

try:
    payload = jwt.decode(token, Auth.SECRET_KEY, algorithms=["HS256"])
    id = payload["id"]
    if payload["exp"] < int(datetime.now().timestamp()):  # dead code
        raise TokenException(code.token_expired, ...)
except InvalidTokenError:  # catches ExpiredSignatureError too
    raise TokenException(code.token_invalid, ...)  # wrong code for expired

After:

try:
    payload = jwt.decode(token, Auth.SECRET_KEY, algorithms=["HS256"])
except ExpiredSignatureError:
    raise TokenException(code.token_expired, ...)   # correct code
except InvalidTokenError:
    raise TokenException(code.token_invalid, ...)   # only truly invalid tokens

Bug 3: assert Statements Used for Runtime Control Flow (HIGH)

File: backend/app/agent/listen_chat_agent.pystep() and astep()

Both step() and astep() use assert message is not None and assert res is not None for critical runtime checks. Python's assert statements are completely removed when running with -O (optimized mode), which is common in production deployments.

When stripped:

  • None message propagates into the SSE ActionDeactivateAgentData event, corrupting the frontend data stream
  • None response returns to callers, causing downstream AttributeError crashes

Before:

assert message is not None    # stripped with python -O
# ...
assert res is not None         # stripped with python -O
return res

After:

if message is None:
    message = ""
    logger.warning(f"Agent {self.agent_name}: message is None ...")

# ...
if res is None:
    raise RuntimeError(
        f"Agent {self.agent_name}: step() returned None without setting error_info"
    )
return res

Bug 4: background_tasks.clear() Doesn't Cancel Running Tasks (HIGH)

File: backend/app/controller/chat_controller.pyimprove() handler

When a user sends a follow-up message after task completion, improve() calls task_lock.background_tasks.clear(). This only removes references from the Python set — it does not cancel the underlying asyncio.Task objects. Ghost tasks continue running in the event loop, holding resources, and potentially writing stale SSE events into the new conversation's stream.

Before:

if hasattr(task_lock, "background_tasks"):
    task_lock.background_tasks.clear()    # tasks keep running

After:

if hasattr(task_lock, "background_tasks"):
    for bg_task in list(task_lock.background_tasks):
        if not bg_task.done():
            bg_task.cancel()
    task_lock.background_tasks.clear()

Test Plan

Added 6 new unit tests:

Assert Guard Tests (test_listen_chat_agent.py):

  • test_step_handles_none_message_without_assert — verifies step() doesn't crash with AssertionError
  • test_step_raises_runtime_error_when_res_is_none — verifies RuntimeError instead of AssertionError
  • test_astep_handles_none_message_without_assert — async version of message guard
  • test_astep_raises_runtime_error_when_res_is_none — async version of res guard

Background Task Tests (test_chat_controller.py):

  • test_improve_cancels_background_tasks_when_done — verifies running tasks are cancelled, done tasks are not
  • test_improve_handles_missing_background_tasks_attr — verifies graceful handling without the attribute

Files Changed

File Change
electron/main/index.ts Set nodeIntegration: false, contextIsolation: true, sandbox: true in child windows
server/app/component/auth.py Catch ExpiredSignatureError separately; remove dead manual exp check
backend/app/agent/listen_chat_agent.py Replace 4 assert statements with if/raise guards
backend/app/controller/chat_controller.py Cancel background tasks before clearing the set
backend/tests/app/agent/test_listen_chat_agent.py +4 tests for assert replacement
backend/tests/app/controller/test_chat_controller.py +2 tests for task cancellation

…, and electron

1. fix(electron): enforce context isolation in child windows
   The `open-win` IPC handler created child BrowserWindows with
   `nodeIntegration: true` and `contextIsolation: false`, while the
   main window correctly uses `contextIsolation: true`. Any XSS in a
   child window would gain full Node.js access. Fixed by setting
   `nodeIntegration: false`, `contextIsolation: true`, and
   `sandbox: true` to match the main window's security posture.

2. fix(server): separate ExpiredSignatureError from InvalidTokenError
   in JWT decoding
   `jwt.decode()` raises `ExpiredSignatureError` (a subclass of
   `InvalidTokenError`) for expired tokens before the manual `exp`
   check can execute. The `except InvalidTokenError` block catches
   both cases and raises `token_invalid`, making it impossible for
   clients to distinguish expired tokens from corrupted ones. This
   breaks automatic token refresh flows. Fixed by catching
   `ExpiredSignatureError` explicitly before `InvalidTokenError`.

3. fix(backend): replace assert statements with proper runtime guards
   in ListenChatAgent
   `assert message is not None` and `assert res is not None` in both
   `step()` and `astep()` are stripped when Python runs with `-O`
   (optimized mode). This causes silent `None` propagation into the
   SSE event stream and downstream consumers. Replaced with explicit
   `if`/`raise RuntimeError` guards that survive optimization and
   provide clear error messages.

4. fix(backend): cancel background tasks before clearing the set
   `background_tasks.clear()` in the `improve()` handler only removes
   references from the set — it does not cancel running asyncio.Task
   objects. Ghost tasks continue executing, potentially writing stale
   SSE events into a new conversation's stream. Fixed by iterating
   and calling `.cancel()` on each non-done task before clearing.

Tests:
- 4 new tests for assert guard replacements (step/astep, sync/async)
- 2 new tests for background task cancellation behavior
Copy link
Collaborator

@bytecii bytecii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

)

assert message is not None
if message is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we change to warning and let the step continue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — changed to raise RuntimeError instead, keeping fail-fast behavior consistent with the res=None guard below. Fixed in latest push.



@pytest.mark.unit
class TestAssertReplacedWithProperGuards:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use def test_xxx instead of creating class for now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — replaced class with plain def test_xxx functions.

Address review feedback:
- message=None now raises RuntimeError instead of defaulting to empty
  string, keeping fail-fast behavior consistent with res=None handling
- Replace test class with plain def test_xxx functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments