Skip to content

XLSX export truncations overwrite the last data column. #2088

@dikshaa2909

Description

@dikshaa2909

When returning truncated results in the XLSX export (add_xlsx_worksheet in scanpipe/pipes/output.py), if an error occurs , the error warning string overwrites the last legitimate data column instead of writing to the separate xlsx_errors column.

This destroys exported user data for the last column (or the only column, if there is only 1 export field) across all large project exports while leaving the intended error column completely empty.

Reproduce

  1. Export a project to XLSX using the add_xlsx_worksheet method using any number of fields.
  2. Have one data row contain a string field that exceeds Excel's 32767 string length limit.
  3. Observe the output .xlsx file. The final data field will inexplicably be replaced by the truncation warning, and the actual data is permanently lost. Concurrently, the trailing xlsx_errors column will be blank.

The existing unit tests miss this because they test an export with exactly 1 data column The len(["foo"]) - 1 = 0 causes the error to overwrite Column 0 instead of writing to the new Column 1, destroying the data.
The test only string-matches the raw XML instead of verifying column positions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions