Skip to content

MDR API SQLAlchemy engine uses echo=True — logs every SQL statement to CloudWatch #956

@bjagg

Description

@bjagg

Surfaced during the #954 audit (DATABASE_URL log-leak pattern survey). Adjacent to #938 but a different concern — not a credential leak, but unnecessary log volume + a soft data-leakage surface.

What

components/lif/mdr_utils/database_setup.py:21:

engine = create_async_engine(DATABASE_URL, echo=True)

echo=True makes SQLAlchemy emit every SQL statement (and their bound parameters) at INFO level on the standard logger. On dev / demo this routes to the shared CloudWatch log group.

Why it matters

  1. Volume. Every MDR API request produces multiple SQL log lines. Aggregated across the dev cluster's services this is a real CloudWatch cost line item and makes operator debugging harder ("where are the actual app logs in this noise").
  2. Bound parameters often include user-supplied data. A SELECT * FROM "DataModels" WHERE "Name" = 'something a user typed' shows up in logs verbatim. Most of MDR's queries are over metadata, so the actual leak risk is low — but it's a porous boundary that grows if user-content tables get added.
  3. Production-vs-dev parity. echo=True is a "development-only" SQLAlchemy setting; running it in deployed envs is a misconfiguration in spirit even if it isn't actively breaking anything.

Suggested fix

Either:

engine = create_async_engine(DATABASE_URL, echo=False)

…or make it env-driven so local devs can flip it on:

echo = os.getenv("SQLALCHEMY_ECHO", "false").lower() == "true"
engine = create_async_engine(DATABASE_URL, echo=echo)

The env-driven shape is friendlier for local debugging. Either way the deployed value should be false.

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions