Skip to content

Unbounded-input DoS hardening — ef_search, TLS handshake, ILP line, SQL expr depth #41

@hollanf

Description

@hollanf

Four distinct code paths accept attacker-controlled input without an upper bound. Each is independently exploitable to either OOM the process or wedge/crash a server thread from a single connection. Grouped as an epic because all share the same root cause (missing resource limit) and a single hardening sweep can cover them.


1. ef_search parameter has no upper bound — single-query OOM

File: nodedb/src/data/executor/handlers/vector_search.rs:354-361effective_ef:

fn effective_ef(ef_search: usize, top_k: usize) -> usize {
    if ef_search > 0 {
        ef_search.max(top_k)          // ← only floor, no ceiling
    } else {
        top_k.saturating_mul(4).max(64)
    }
}

and the HNSW consumer at nodedb-vector/src/hnsw/search.rs:18-48:

pub fn search(&self, query: &[f32], k: usize, ef: usize) -> Vec<SearchResult> {
    ...
    let ef = ef.max(k);               // ← only floor again
    ...
    let results = search_layer(self, query, current_ep, ef, 0, None);

ef_search propagates from user SQL (SET ef_search = N), from the protocol struct (nodedb-types/src/protocol.rs:391 pub ef_search: Option<u64>), and from the SQL planner (nodedb-sql/src/planner/select.rs:568, 654 sets ef_search: limit * 2) straight into search_layer, which allocates a BinaryHeap of up to ef candidates plus a HashSet<u32> that grows until the heap is drained.

A single authenticated client issuing SET ef_search = 1_000_000_000 causes immediate multi-GB allocation. Also exploitable via a huge LIMIT because planner/select.rs sets ef_search = limit * 2.

Repo-wide grep for MAX_EF / ef.min returns zero matches — no ceiling exists anywhere.


2. TLS handshake has no deadline — slow-loris wedges connection semaphore across native / RESP / ILP listeners

Files:

Representative pattern (native, listener.rs:120-138):

if let Some(ref acceptor) = tls_acceptor {
    let acceptor = acceptor.clone();
    connections.spawn(async move {
        match acceptor.accept(stream).await {   // ← no tokio::time::timeout
            Ok(tls_stream) => { /* session.run() */ }
            Err(e) => { warn!(...); }
        }
        drop(permit);
    });
}

The accept loop acquires a semaphore permit, spawns a task, then awaits the TLS handshake with no deadline. tokio_rustls::TlsAcceptor::accept only makes progress when the client sends data, so a client who opens TCP, sends 1 byte of ClientHello, and holds pins the permit indefinitely. The session-level idle timeout in native/session.rs:88 only runs after a successful handshake.

N slow clients pin N permits; once the semaphore is drained, every legitimate TLS client is RST'd at accept (listener.rs:102-113 — try_acquire_owned + continue with dropped socket).

All three listeners share the same pattern.


3. ILP plaintext listener reads unbounded line length → OOM from one connection

File: nodedb/src/control/server/ilp_listener.rs:154-200

async fn handle_ilp_connection(stream: ConnStream, peer: SocketAddr, state: &SharedState) -> crate::Result<()> {
    ...
    let reader = BufReader::new(stream);
    let mut lines = reader.lines();
    ...
    loop {
        tokio::select! {
            result = lines.next_line() => {
                match result {
                    Ok(Some(line)) => {
                        ...
                        batch.push_str(&line);

tokio::io::AsyncBufReadExt::lines grows the returned String until it hits \n. No maximum length.

ILP is plaintext (port 9009 by default, used by telegraf / vector / InfluxDB clients) and per-tenant quota checks happen after the read. An attacker connects, streams a bytes forever without ever sending \n — the String reallocates until OOM. The semaphore permit stays held the entire time; the task never yields to any idle-based cancellation at the line level.

Slow-drip variant (one byte per second) is also effective because there's no per-read deadline.


4. SQL expression parser + resolver have no recursion depth limit → stack overflow DoS

Files:

Grep for MAX_DEPTH / recursion_limit / depth across nodedb-query/src/expr_parse.rs and nodedb-sql/src/resolver/expr.rs returns zero matches. No depth guard anywhere in the pipeline.

A WHERE ((((...((x))...)))) with tens of thousands of parentheses (or a deeply nested generated-column expression) recurses through parse_expr → parse_or → parse_and → parse_comparison → parse_additive → parse_multiplicative → parse_unary → parse_primary (≈ 8 stack frames per (), stack-overflowing the server thread. On Linux with default 8 MB stack that's ~10–20 k parens; on macOS non-main threads (512 KB) it's ~1–2 k.

A single SQL statement from a single authenticated client crashes the thread (and in some handler paths, the node).

Reproduction:

SELECT ( ( ( ( ... x ... ) ) ) ) FROM t;        -- 10 000 nested parens
-- or:
CREATE TABLE t (x INT GENERATED ALWAYS AS (( … x … )) STORED);

Checklist

  • 1. Clamp ef_search to a configured max in effective_ef and HnswIndex::search; reject excessive values at the protocol boundary.
  • 2. Wrap acceptor.accept(stream) in tokio::time::timeout(tls_handshake_timeout, …) for all three listeners (native, RESP, ILP). Also consider a pre-handshake read deadline for the plaintext branches.
  • 3. Replace BufReader::lines in ilp_listener.rs with a length-bounded reader (e.g. LinesCodec::new_with_max_length, or manual read_until(b'\n') with a cap).
  • 4. Thread a depth counter through parse_expr, convert_expr, and eval_scope (or convert hot cases to iterative-with-explicit-stack). Return a typed error on exceed.

Notes

  • Found during a CPU/memory + DoS audit sweep. Each item is independently verifiable; checkboxes let PRs close them one-by-one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions