Skip to content

The caculation of the range stop may be incorrect in the function download_async #65

@jstzwj

Description

@jstzwj

System Info

  • System Configuration:
  • OS: Windows 10
  • CPU: Intel i7-10700
  • RAM: 64GB
  • Python Version: 3.11.9
  • hf_transfer Version: 0.1.9

Reproduction

I am developing a self-hosted Huggingface mirror and have encountered a mismatch between the requested and actual file sizes when downloading files with HF_HUB_ENABLE_HF_TRANSFER enabled. After investigation, I think this is due to a bug in the stop calculation in the chunked download logic.

The issue arises in the following code snippet from src/lib.rs:

let stop = std::cmp::min(start + chunk_size - 1, length);

In the content range of http request with the format <range-start>-<range-end>, <range-start> and <range-end> are integers for the start and end position (zero-indexed & inclusive), respectively.

Here, length is an exclusive index, while stop is an inclusive index. This leads to an off-by-one error when start + chunk_size - 1 equals length. In my case, a file has a length of 8946552810. Hf client requests a range xxxx-8946552810. So start + chunk_size - 1 is also 8946552810, then stop will be 8946552810, which is incorrect.

Although HTTP servers are very tolerant of out-of-bounds requests, based on my observations, the Huggingface server does not return a 416 HTTP error (Range Not Satisfiable) for such out-of-range requests. Instead, it simply returns the content within the valid file size range. Nevertheless, I still believe that correct range calculation is necessary here.

Expected behavior

The correct calculation should be:

let stop = std::cmp::min(start + chunk_size - 1, length - 1);

or

let exclusive_stop = std::cmp::min(start + chunk_size, length);
let stop = exclusive_stop - 1;

This ensures that stop is always within the bounds of the file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions