Bug Report: org:search:dump fails on large sources with RangeError: Invalid array length

## Description
`coveo org:search:dump` consistently crashes when attempting to export a very large source (~4M items). The failure always occurs after ~600k–700k results and produces:

```
RangeError: Invalid array length
```

This is caused by the CLI aggregating **all field names from every result** into a single array (`aggregatedFieldsWithDupes`), which eventually exceeds JavaScript’s maximum array length. The issue is structural and unrelated to memory exhaustion or Node heap size.

---

## Steps To Reproduce
Steps to reproduce the behavior:

1. Run a source dump on a large source, for example:
   ```bash
   coveo org:search:dump --source "YourSourceName" --destination ./dump
   ```
2. Allow the dump to progress past ~600k results.
3. Observe the CLI terminating with:
   ```
   RangeError: Invalid array length
   ```
4. Check stack trace showing failure in:
   - `extractFieldsFromAggregatedResults`
   - `dumpAggregatedResults`
   - `aggregateResults`
   - `fetchResults`

---

## Expected behavior
`org:search:dump` should:
- Successfully export large sources (millions of items).
- Stream results directly to disk without accumulating unbounded arrays.
- Track unique field names incrementally using a `Set` or similar structure.
- Avoid exceeding JavaScript’s array-length limits.

The dump should complete regardless of source size or number of fields.

---

## Screenshots
Stack trace excerpt illustrating the error:
```
RangeError: Invalid array length
    at Array.push (<anonymous>)
    at Dump.extractFieldsFromAggregatedResults (.../dump.js:162:40)
    at Dump.dumpAggregatedResults (.../dump.js:157:14)
    at Dump.aggregateResults (.../dump.js:149:18)
```

[error.log](https://github.com/user-attachments/files/24061342/error.log)

---

## Desktop:
- **OS:** Windows 11
  - **OS version:** 23H2
- **Browser:** N/A (CLI operation)
- **CLI Version:** Latest version as of 2025-12-09
- **Local Node version:** e.g., 18.x
- **Local NPM version:** e.g., 9.x

---

## Where the problem occurs
The issue originates in `dump.ts`:

```ts
private extractFieldsFromAggregatedResults() {
  this.aggregatedFieldsWithDupes.push(
    ...this.aggregatedResults.flatMap(Object.keys)
  );
}
```

Call chain:
```
extractFieldsFromAggregatedResults
  → dumpAggregatedResults
    → aggregateResults
      → fetchResults
```

Because `aggregatedFieldsWithDupes` grows:
- for every result,
- across the entire dump,
- containing duplicates,
- and includes potentially thousands of fields per item (dynamic fields, dictionary fields, system fields),

…the array eventually crosses JavaScript’s array-length ceiling (~2³²−1). The spread operator `push(...hugeArray)` triggers:

```
RangeError: Invalid array length
```

Increasing Node’s heap size does **not** affect this outcome.

---

## Why this design fails at scale
- JavaScript array lengths are capped to a 32-bit range.
- `aggregatedFieldsWithDupes` grows unbounded as the dump progresses.
- Large sources with many fields multiply the size of this array quickly.
- The CLI attempts to aggregate **all field names for all items before writing output**, which is not feasible for multi-million-item dumps.

Thus, the failure is inherent to the current design rather than an environmental or memory constraint.

---

## Impact
- `org:search:dump` cannot export large enterprise sources.
- The crash occurs reliably around 600k–700k items processed.
- Prevents use of the CLI for:
  - updating `permanentid` mappings (ID_MAPPING) across associated machine learning models.
  - audit,
  - analytics extraction.
- `--fieldsToExclude` helps only in limited cases; many sources contain high-cardinality dynamic fields where broad exclusion is not feasible.

---

## Proposed fix
### Switch from **aggregate-then-write** to a **streaming write model**
Rather than accumulating all field names and all results in memory, modify the algorithm to:

1. Write each page of results **directly** to disk on retrieval.
2. Track field names using a **Set<string>** instead of a giant deduplicated-on-write array.
3. Avoid ever using `push(...largeArray)`.
4. Keep memory usage constant regardless of source size.

### Benefits
- Eliminates array-length overflow.
- Enables dumping extremely large sources.
- Reduces memory footprint dramatically.
- Matches proven durable patterns used in log processing, ETL tools, and database dump pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: org:search:dump fails on large sources with RangeError: Invalid array length #1536

Description

Steps To Reproduce

Expected behavior

Screenshots

Desktop:

Where the problem occurs

Why this design fails at scale

Impact

Proposed fix

Switch from aggregate-then-write to a streaming write model

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug Report: org:search:dump fails on large sources with RangeError: Invalid array length #1536

Description

Description

Steps To Reproduce

Expected behavior

Screenshots

Desktop:

Where the problem occurs

Why this design fails at scale

Impact

Proposed fix

Switch from aggregate-then-write to a streaming write model

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions