When returning truncated results in the XLSX export (add_xlsx_worksheet in scanpipe/pipes/output.py), if an error occurs , the error warning string overwrites the last legitimate data column instead of writing to the separate xlsx_errors column.
This destroys exported user data for the last column (or the only column, if there is only 1 export field) across all large project exports while leaving the intended error column completely empty.
Reproduce
- Export a project to XLSX using the add_xlsx_worksheet method using any number of fields.
- Have one data row contain a string field that exceeds Excel's 32767 string length limit.
- Observe the output
.xlsx file. The final data field will inexplicably be replaced by the truncation warning, and the actual data is permanently lost. Concurrently, the trailing xlsx_errors column will be blank.
The existing unit tests miss this because they test an export with exactly 1 data column The len(["foo"]) - 1 = 0 causes the error to overwrite Column 0 instead of writing to the new Column 1, destroying the data.
The test only string-matches the raw XML instead of verifying column positions.
When returning truncated results in the XLSX export (add_xlsx_worksheet in scanpipe/pipes/output.py), if an error occurs , the error warning string overwrites the last legitimate data column instead of writing to the separate
xlsx_errorscolumn.This destroys exported user data for the last column (or the only column, if there is only 1 export field) across all large project exports while leaving the intended error column completely empty.
Reproduce
.xlsxfile. The final data field will inexplicably be replaced by the truncation warning, and the actual data is permanently lost. Concurrently, the trailingxlsx_errorscolumn will be blank.The existing unit tests miss this because they test an export with exactly 1 data column The len(["foo"]) - 1 = 0 causes the error to overwrite Column 0 instead of writing to the new Column 1, destroying the data.
The test only string-matches the raw XML instead of verifying column positions.