Skip to content

Streaming large files #26

@JakeHamix

Description

@JakeHamix

Hi there,

I have recently been working on a custom exporting feature in our project. The idea is, that a user is able to export custom (and often rather large) datasets from the GUI as an Excel file. I am able to stream the data directly from the database through knex, modify whatever I need with through2 and finally create an excel file with xlsx. We are still talking streams here, all the way.

However, then I hit a wall when I dug through the library and found out, that under the hood, .xlsx file is just a zip archive that is created using the archiver library. Now seemingly, the archiver is able to process incoming steams:

archive.append(this.sheetStream, { name: 'xl/worksheets/sheet1.xml' });

However, it seems that until the stream is ended, the entry event is not emitted and the whole file is somehow multiplied and buffered in memory. This means two things as a result:

  • When exporting larger datasets (50+Mb), the RAM usage hits multitudes of the file size, think low hundreds for a single export
  • I am unable to stream the file creation directly to the user, he has to wait until all the data is exported, the zip archive is created and then the resulting zip archive can be transferred. This is a bit misleading, but the reason for this is the nature of the excel file. There is one large file (99.9% volume) and a dozen of small ones.

My question is simple: Is there even a possible solution for this specific case? Obviously, if I could read the zip archive while it's being created (before the stream is ended), that would solve my issue. But I presume that the nature of the whole comprimation process does not allow for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions