Implement pagination for tsv export#14
Conversation
ainefairbrother
left a comment
There was a problem hiding this comment.
Thanks for this @likhitha-surapaneni.
Thinking about this a bit more, and testing the feature, it seems that this is perhaps a heavy way to do it.
Previously, this file download endpoint made one regular ES search request. Because ES limits a single request to 10k, download sizes stop at 10k.
In this PR, the backend pages through the full result set in 10k chunks using scroll, which does solve the original 10k limit and allows much larger downloads. The problem is that it still builds the entire TSV in memory, and only then returns it. So if the download is very large, the server has to hold the whole file in memory at once, which can increase memory use and make the download slower - this will be more problematic as our number of files in the db increases.
Also, the ES_EXPORT_SIZE_CAP change from 10k to 1m does not change behaviour. Each ES request is still capped at 10k, but the code grabs 10k pages for as long as it needs to, resulting in an infinite limit. I do think it's correct for us to permit an infinite limit, as we would like users to be able to download a complete list of our files if they need to.
I think streaming the TSV response would be a better approach? This would let us send each batch as it is grabbed instead of holding the entire export in memory. Let me know what you think.
This approach makes sense to me Aine, thank you. Implemented streaming the TSV response which will not overload memory, let me know what you think |
ainefairbrother
left a comment
There was a problem hiding this comment.
Hi @likhitha-surapaneni, thank you for these changes, this is now very quick and works well for me. It also means that ES_EXPORT_SIZE_CAP is the total export size (total no. of rows). We should keep an eye on this number as our data grows, but it works well as a guardrail for now.
IGSR-618
This implements pagination required for TSV download required by igsr/gca_1000genomes_website#72