Skip to content

pgwire SELECT on TYPE DOCUMENT STRICT silently returns null/empty columns via extended protocol (data loss) #44

@hollanf

Description

@hollanf

SELECT col1, col2 FROM <strict-doc-collection> WHERE … executed via the pgwire extended-query protocol (the default for most drivers) returns rows where every column value is NULL or the row object is [{}], even when the underlying rows exist and the same query via the simple-query protocol returns the correct data.

This is a silent failure — the client receives rows (not an error), the rows look well-formed (correct count), but every field is null. Dedup logic, update paths, cache lookups, and any read-modify-write pattern that uses a SELECT result on a STRICT doc collection quietly misbehaves.

Symptom seen by client

Two production workaround comments from a downstream app, verbatim:

nodedb-memory-store.ts:507-511

NodeDB 0.0.4 quirk: tagged-template SELECT on TYPE DOCUMENT STRICT
collections returns empty row objects ([{}]). Use simple-query
protocol (sql.unsafe) to get the `result`-wrapped JSON shape that
unwrap() can parse correctly. Without this, entity dedup was
returning null on every lookup → new entity row per mention.

nodedb-memory-store.ts:754-756

NodeDB 0.0.4: tagged-template SELECT on TYPE DOCUMENT STRICT collections
silently returns rows with all columns null. Use sql.unsafe (simple-query)
+ esc() quoting. Same pattern as findEntityByExactName fix.

The "entity dedup was returning null on every lookup → new entity row per mention" line is the business-impact tell: any SELECT … WHERE canonical_name = $1 LIMIT 1 over a STRICT doc collection returned an empty-body row, the app treated it as "not found", and inserted a duplicate. A caller can't tell the difference between "row doesn't exist" and "row exists but all fields decoded to null".

Schema shape that triggers it (from the downstream app's schema setup):

CREATE COLLECTION entities TYPE DOCUMENT STRICT (
  id STRING PRIMARY KEY,
  tenant_id STRING NOT NULL,
  user_id STRING NOT NULL,
  canonical_name STRING NOT NULL,
  aliases STRING,
  embedding VECTOR(1024),
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

Query that manifests (extended protocol — postgres.js tagged template):

const rows = await sql\`
  SELECT id, canonical_name FROM entities
  WHERE tenant_id = \${tenantId}
    AND user_id = \${userId}
    AND canonical_name = \${name}
  LIMIT 1
\`;
// rows = [{}] or [{ id: null, canonical_name: null }]

Same query via simple-query protocol (sql.unsafe(...)) returns the actual row contents wrapped in the { result: "..." } envelope — correct data.

Scope

  • Only TYPE DOCUMENT STRICT collections are reported affected; regular CREATE COLLECTION (schemaless) works via either protocol, per the downstream app's split — it uses tagged template for schemaless tables without issue.
  • Affects any pgwire driver that defaults to extended protocol (postgres.js, psycopg, JDBC, pgx), unless the client explicitly opts out of PREPARE (prepare: false in postgres.js).
  • Affects the basic SELECT path — not DSL. This is distinct from issue pgwire extended-query protocol rejects DSL statements (SEARCH / GRAPH / UPSERT / MATCH) — forces simple-query + string interpolation #43 (DSL parse rejection) where the extended path errors out; here it succeeds but returns wrong data, which is worse.

Likely root cause (for triage)

The extended path goes through prepared/execute.rs:46-49:

let mut results = self
    .execute_planned_sql_with_params(&identity, &stmt.sql, tenant_id, &addr, &params)
    .await?;

which routes to execute_planned_sql_inner in routing/mod.rs:87. The simple-query path instead goes through sql_exec.rs:286-307 and ends up in execute_planned_sql plus the DDL dispatcher at line 286. The row-encoding layer (envelope-wrapped JSON result vs. typed columns) differs between the two paths, and somewhere in the typed-column encoder for STRICT-doc scans the fields are not populated — or the result schema inferred at Parse time (prepared/parser.rs:63-76) doesn't match the actual columns decoded at Execute time.

Why this matters

  1. Silent. Silent. Silent. No error, no warning — the driver reports N rows and the application reads N null-valued records.
  2. Corrupts application state. The downstream app was creating a new entity row for every mention because findEntityByExactName returned null. Over days, entity count blows up with duplicates and the LLM's memory graph degrades.
  3. Forces prepare: false workaround globally (see also pgwire extended-query protocol rejects DSL statements (SEARCH / GRAPH / UPSERT / MATCH) — forces simple-query + string interpolation #43), which is also why this bug is not trivially side-stepped.

Repro

CREATE COLLECTION t TYPE DOCUMENT STRICT (
  id STRING PRIMARY KEY,
  name STRING
);
INSERT INTO t (id, name) VALUES ('a', 'alice');
// postgres.js with DEFAULT (extended protocol) — buggy
const rows = await sql\`SELECT id, name FROM t WHERE name = \${'alice'} LIMIT 1\`;
console.log(rows); // expected: [{ id: 'a', name: 'alice' }], actual: [{}] or [{ id: null, name: null }]

// Same query via simple-query — correct
const rows2 = await sql.unsafe(\"SELECT id, name FROM t WHERE name = 'alice' LIMIT 1\");
console.log(rows2); // [{ id: 'a', name: 'alice' }] (possibly wrapped in { result: \"...\" })

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions