System Info
- System Configuration:
- OS: Windows 10
- CPU: Intel i7-10700
- RAM: 64GB
- Python Version: 3.11.9
- hf_transfer Version: 0.1.9
Reproduction
I am developing a self-hosted Huggingface mirror and have encountered a mismatch between the requested and actual file sizes when downloading files with HF_HUB_ENABLE_HF_TRANSFER enabled. After investigation, I think this is due to a bug in the stop calculation in the chunked download logic.
The issue arises in the following code snippet from src/lib.rs:
let stop = std::cmp::min(start + chunk_size - 1, length);
In the content range of http request with the format <range-start>-<range-end>, <range-start> and <range-end> are integers for the start and end position (zero-indexed & inclusive), respectively.
Here, length is an exclusive index, while stop is an inclusive index. This leads to an off-by-one error when start + chunk_size - 1 equals length. In my case, a file has a length of 8946552810. Hf client requests a range xxxx-8946552810. So start + chunk_size - 1 is also 8946552810, then stop will be 8946552810, which is incorrect.
Although HTTP servers are very tolerant of out-of-bounds requests, based on my observations, the Huggingface server does not return a 416 HTTP error (Range Not Satisfiable) for such out-of-range requests. Instead, it simply returns the content within the valid file size range. Nevertheless, I still believe that correct range calculation is necessary here.
Expected behavior
The correct calculation should be:
let stop = std::cmp::min(start + chunk_size - 1, length - 1);
or
let exclusive_stop = std::cmp::min(start + chunk_size, length);
let stop = exclusive_stop - 1;
This ensures that stop is always within the bounds of the file.
System Info
Reproduction
I am developing a self-hosted Huggingface mirror and have encountered a mismatch between the requested and actual file sizes when downloading files with HF_HUB_ENABLE_HF_TRANSFER enabled. After investigation, I think this is due to a bug in the stop calculation in the chunked download logic.
The issue arises in the following code snippet from src/lib.rs:
In the content range of http request with the format <range-start>-<range-end>, <range-start> and <range-end> are integers for the start and end position (zero-indexed & inclusive), respectively.
Here, length is an exclusive index, while stop is an inclusive index. This leads to an off-by-one error when
start + chunk_size - 1equals length. In my case, a file has a length of8946552810. Hf client requests a rangexxxx-8946552810. Sostart + chunk_size - 1is also8946552810, then stop will be8946552810, which is incorrect.Although HTTP servers are very tolerant of out-of-bounds requests, based on my observations, the Huggingface server does not return a 416 HTTP error (Range Not Satisfiable) for such out-of-range requests. Instead, it simply returns the content within the valid file size range. Nevertheless, I still believe that correct range calculation is necessary here.
Expected behavior
The correct calculation should be:
or
This ensures that
stopis always within the bounds of the file.