Make faidx work with very long (>4 Gbyte!) lines#2008
Open
daviesrob wants to merge 1 commit into
Open
Conversation
jmarshall
reviewed
May 12, 2026
|
|
||
| while ((l = hgetln(buf, 0x10000, fp)) > 0) { | ||
| uint32_t line_len, line_blen, n; | ||
| uint64_t line_len, line_blen, n; |
Member
There was a problem hiding this comment.
It doesn't affect the behaviour, but this is a good opportunity to make n a plain int.
Member
Author
There was a problem hiding this comment.
Agreed, it's now an int.
Although faidx should support very long references, writing one longer than 4Gbases on a single line broke it because it used a uint32_t field to store the line length. To make it work with such inputs, faidx1_t::line_blen is increased in size to uint64_t so the correct length can be stored. To avoid having to do the same for faidx1_t::line_len, which would make each entry quite a bit bigger for a fairly rare use-case, that field is changed so that it stores the number of bytes to be skipped at the end of each line instead of the full length. As this value will usually only be 1 or 2, a uint32_t is plenty big enough for it. Combined with the fact that the original structure had a four-byte hole in it (between line_blen and len), it's possible to store the longer line lengths while keeping faidx1_t exactly the same size as it had before.
da343ee to
60ac4ea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Although faidx should support very long references, writing one longer than 4Gbases on a single line broke it because it used a
uint32_tfield to store the line length.To make it work with such inputs,
faidx1_t::line_blenis increased in size touint64_tso the correct length can be stored. To avoid having to do the same forfaidx1_t::line_len, which would make each entry quite a bit bigger for a fairly rare use-case, that field is changed so that it stores the number of bytes to be skipped at the end of each line instead of the full length. As this value will usually only be 1 or 2, auint32_tis plenty big enough for it. Combined with the fact that the original structure had a four-byte hole in it (betweenline_blenandlen), it's possible to store the longer line lengths while keepingfaidx1_texactly the same size as it had before.Fixes samtools/samtools#2331