Description
coveo org:search:dump consistently crashes when attempting to export a very large source (~4M items). The failure always occurs after ~600k–700k results and produces:
RangeError: Invalid array length
This is caused by the CLI aggregating all field names from every result into a single array (aggregatedFieldsWithDupes), which eventually exceeds JavaScript’s maximum array length. The issue is structural and unrelated to memory exhaustion or Node heap size.
Steps To Reproduce
Steps to reproduce the behavior:
- Run a source dump on a large source, for example:
coveo org:search:dump --source "YourSourceName" --destination ./dump
- Allow the dump to progress past ~600k results.
- Observe the CLI terminating with:
RangeError: Invalid array length
- Check stack trace showing failure in:
extractFieldsFromAggregatedResults
dumpAggregatedResults
aggregateResults
fetchResults
Expected behavior
org:search:dump should:
- Successfully export large sources (millions of items).
- Stream results directly to disk without accumulating unbounded arrays.
- Track unique field names incrementally using a
Set or similar structure.
- Avoid exceeding JavaScript’s array-length limits.
The dump should complete regardless of source size or number of fields.
Screenshots
Stack trace excerpt illustrating the error:
RangeError: Invalid array length
at Array.push (<anonymous>)
at Dump.extractFieldsFromAggregatedResults (.../dump.js:162:40)
at Dump.dumpAggregatedResults (.../dump.js:157:14)
at Dump.aggregateResults (.../dump.js:149:18)
error.log
Desktop:
- OS: Windows 11
- Browser: N/A (CLI operation)
- CLI Version: Latest version as of 2025-12-09
- Local Node version: e.g., 18.x
- Local NPM version: e.g., 9.x
Where the problem occurs
The issue originates in dump.ts:
private extractFieldsFromAggregatedResults() {
this.aggregatedFieldsWithDupes.push(
...this.aggregatedResults.flatMap(Object.keys)
);
}
Call chain:
extractFieldsFromAggregatedResults
→ dumpAggregatedResults
→ aggregateResults
→ fetchResults
Because aggregatedFieldsWithDupes grows:
- for every result,
- across the entire dump,
- containing duplicates,
- and includes potentially thousands of fields per item (dynamic fields, dictionary fields, system fields),
…the array eventually crosses JavaScript’s array-length ceiling (~2³²−1). The spread operator push(...hugeArray) triggers:
RangeError: Invalid array length
Increasing Node’s heap size does not affect this outcome.
Why this design fails at scale
- JavaScript array lengths are capped to a 32-bit range.
aggregatedFieldsWithDupes grows unbounded as the dump progresses.
- Large sources with many fields multiply the size of this array quickly.
- The CLI attempts to aggregate all field names for all items before writing output, which is not feasible for multi-million-item dumps.
Thus, the failure is inherent to the current design rather than an environmental or memory constraint.
Impact
org:search:dump cannot export large enterprise sources.
- The crash occurs reliably around 600k–700k items processed.
- Prevents use of the CLI for:
- updating
permanentid mappings (ID_MAPPING) across associated machine learning models.
- audit,
- analytics extraction.
--fieldsToExclude helps only in limited cases; many sources contain high-cardinality dynamic fields where broad exclusion is not feasible.
Proposed fix
Switch from aggregate-then-write to a streaming write model
Rather than accumulating all field names and all results in memory, modify the algorithm to:
- Write each page of results directly to disk on retrieval.
- Track field names using a Set instead of a giant deduplicated-on-write array.
- Avoid ever using
push(...largeArray).
- Keep memory usage constant regardless of source size.
Benefits
- Eliminates array-length overflow.
- Enables dumping extremely large sources.
- Reduces memory footprint dramatically.
- Matches proven durable patterns used in log processing, ETL tools, and database dump pipelines.
Description
coveo org:search:dumpconsistently crashes when attempting to export a very large source (~4M items). The failure always occurs after ~600k–700k results and produces:This is caused by the CLI aggregating all field names from every result into a single array (
aggregatedFieldsWithDupes), which eventually exceeds JavaScript’s maximum array length. The issue is structural and unrelated to memory exhaustion or Node heap size.Steps To Reproduce
Steps to reproduce the behavior:
coveo org:search:dump --source "YourSourceName" --destination ./dumpextractFieldsFromAggregatedResultsdumpAggregatedResultsaggregateResultsfetchResultsExpected behavior
org:search:dumpshould:Setor similar structure.The dump should complete regardless of source size or number of fields.
Screenshots
Stack trace excerpt illustrating the error:
error.log
Desktop:
Where the problem occurs
The issue originates in
dump.ts:Call chain:
Because
aggregatedFieldsWithDupesgrows:…the array eventually crosses JavaScript’s array-length ceiling (~2³²−1). The spread operator
push(...hugeArray)triggers:Increasing Node’s heap size does not affect this outcome.
Why this design fails at scale
aggregatedFieldsWithDupesgrows unbounded as the dump progresses.Thus, the failure is inherent to the current design rather than an environmental or memory constraint.
Impact
org:search:dumpcannot export large enterprise sources.permanentidmappings (ID_MAPPING) across associated machine learning models.--fieldsToExcludehelps only in limited cases; many sources contain high-cardinality dynamic fields where broad exclusion is not feasible.Proposed fix
Switch from aggregate-then-write to a streaming write model
Rather than accumulating all field names and all results in memory, modify the algorithm to:
push(...largeArray).Benefits