Implement pagination for tsv export by likhitha-surapaneni · Pull Request #14 · igsr/igsr-be

likhitha-surapaneni · 2026-03-05T10:10:15Z

IGSR-618
This implements pagination required for TSV download required by igsr/gca_1000genomes_website#72

ainefairbrother

Thinking about this a bit more, and testing the feature, it seems that this is perhaps a heavy way to do it.

Previously, this file download endpoint made one regular ES search request. Because ES limits a single request to 10k, download sizes stop at 10k.

In this PR, the backend pages through the full result set in 10k chunks using scroll, which does solve the original 10k limit and allows much larger downloads. The problem is that it still builds the entire TSV in memory, and only then returns it. So if the download is very large, the server has to hold the whole file in memory at once, which can increase memory use and make the download slower - this will be more problematic as our number of files in the db increases.

Also, the ES_EXPORT_SIZE_CAP change from 10k to 1m does not change behaviour. Each ES request is still capped at 10k, but the code grabs 10k pages for as long as it needs to, resulting in an infinite limit. I do think it's correct for us to permit an infinite limit, as we would like users to be able to download a complete list of our files if they need to.

I think streaming the TSV response would be a better approach? This would let us send each batch as it is grabbed instead of holding the entire export in memory. Let me know what you think.

likhitha-surapaneni · 2026-03-16T13:28:20Z

Thanks for this @likhitha-surapaneni.

Thinking about this a bit more, and testing the feature, it seems that this is perhaps a heavy way to do it.

Previously, this file download endpoint made one regular ES search request. Because ES limits a single request to 10k, download sizes stop at 10k.

In this PR, the backend pages through the full result set in 10k chunks using scroll, which does solve the original 10k limit and allows much larger downloads. The problem is that it still builds the entire TSV in memory, and only then returns it. So if the download is very large, the server has to hold the whole file in memory at once, which can increase memory use and make the download slower - this will be more problematic as our number of files in the db increases.

Also, the ES_EXPORT_SIZE_CAP change from 10k to 1m does not change behaviour. Each ES request is still capped at 10k, but the code grabs 10k pages for as long as it needs to, resulting in an infinite limit. I do think it's correct for us to permit an infinite limit, as we would like users to be able to download a complete list of our files if they need to.

I think streaming the TSV response would be a better approach? This would let us send each batch as it is grabbed instead of holding the entire export in memory. Let me know what you think.

This approach makes sense to me Aine, thank you. Implemented streaming the TSV response which will not overload memory, let me know what you think

ainefairbrother

Hi @likhitha-surapaneni, thank you for these changes, this is now very quick and works well for me. It also means that ES_EXPORT_SIZE_CAP is the total export size (total no. of rows). We should keep an eye on this number as our data grows, but it works well as a guardrail for now.

Implement pagination for tsv export

ad98683

ainefairbrother self-requested a review March 6, 2026 10:32

ainefairbrother requested changes Mar 6, 2026

View reviewed changes

Streaming tsv export instead of storing in mem

760ee18

likhitha-surapaneni requested a review from ainefairbrother March 16, 2026 13:38

ainefairbrother approved these changes Mar 16, 2026

View reviewed changes

ainefairbrother merged commit e71c741 into igsr:main Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pagination for tsv export#14

Implement pagination for tsv export#14
ainefairbrother merged 2 commits intoigsr:mainfrom
likhitha-surapaneni:feature/pagination-export-tsv

likhitha-surapaneni commented Mar 5, 2026

Uh oh!

ainefairbrother left a comment

Uh oh!

likhitha-surapaneni commented Mar 16, 2026

Uh oh!

ainefairbrother left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

likhitha-surapaneni commented Mar 5, 2026

Uh oh!

ainefairbrother left a comment

Choose a reason for hiding this comment

Uh oh!

likhitha-surapaneni commented Mar 16, 2026

Uh oh!

ainefairbrother left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants