Skip to content

Clarify start/end query parameter and Range header edge cases#113

Open
nsheff wants to merge 1 commit into
masterfrom
fix/sequence-params
Open

Clarify start/end query parameter and Range header edge cases#113
nsheff wants to merge 1 commit into
masterfrom
fix/sequence-params

Conversation

@nsheff
Copy link
Copy Markdown
Member

@nsheff nsheff commented May 28, 2026

Summary

Addresses #107 by clarifying edge cases for start/end query parameters and Range headers.

Changes:

  • Allow start or end query parameters to be used independently
  • Change out-of-bounds start from 400 Bad Request to 416 Range Not Satisfiable (matches EBI implementation)
  • Add explicit 416 for out-of-bounds end
  • Align Range header behavior with RFC 7233:
    • Clip last-byte-pos if it exceeds sequence length
    • Return 416 if first-byte-pos exceeds sequence length

Testing

Tested against EBI implementation. See test script and issue comment for details.

Note: EBI has a bug where Range header with first-byte-pos > length returns 200 with full sequence instead of 416. This is not tested by the compliance suite.

Fixes #107

- Allow start or end query parameters to be used independently
- Change out-of-bounds start from 400 to 416 (matches EBI)
- Add explicit 416 for out-of-bounds end
- Align Range header behavior with RFC 7233 (clip last-byte-pos,
  reject first-byte-pos > length with 416)

Fixes #107
Copy link
Copy Markdown
Collaborator

@andrewyatz andrewyatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that codification of the spec is right here and I had not paid attention to RFC 7233 closely enough that one that exceeds length is clipped. Implementations will need to remember if they have been given a Range header vs args but that should be simple to code around.

I was worried that the reference Python implementation I wrote would fail this but on first read it looks to handle this.

Comment thread docs/sequences/README.md
|-----------|-------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. The server MUST respond with a `Bad Request` error if start is specified and is larger than the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. |
| `end` | 32-bit unsigned integer | Optional | The end position of the range on the sequence, 0-based, exclusive. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if end and the Range header are both specified. |
| `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. Either `start` or `end` may be specified independently; if only `start` is specified, the server returns the sub-sequence from `start` to the end of the sequence. The server MUST respond with a `Range Not Satisfiable` error if start is specified and is larger than the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. Either `start` or `end` may be specified independently; if only `start` is specified, the server returns the sub-sequence from `start` to the end of the sequence. The server MUST respond with a `Range Not Satisfiable` error if start is specified and is larger than the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. |
| `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. Either `start` or `end` may be specified independently; if only `start` is specified, the server returns the sub-sequence from `start` to the end of the sequence. The server MUST respond with a `Range Not Satisfiable` error if start is specified and is greater than or equal to the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. |

Comment thread docs/sequences/README.md
| Parameter | Data Type | Required | Description |
|-----------|-------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. The server MUST respond with a `Bad Request` error if one or more ranges are out of bounds of the sequence. |
| `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. Per RFC 7233, if the `last-byte-pos` exceeds the sequence length, the server MUST clip it to the sequence length; if the `first-byte-pos` exceeds the sequence length, the server MUST respond with `Range Not Satisfiable`. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. Per RFC 7233, if the `last-byte-pos` exceeds the sequence length, the server MUST clip it to the sequence length; if the `first-byte-pos` exceeds the sequence length, the server MUST respond with `Range Not Satisfiable`. |
| `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. Per RFC 7233, if the `last-byte-pos` exceeds the sequence length, the server MUST clip it to the sequence length; if the `first-byte-pos` is greater than or equal to the sequence length, the server MUST respond with `Range Not Satisfiable`. |

Copy link
Copy Markdown
Member

@jmarshall jmarshall May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'd suggest avoiding explaining what clipping means by phrasing this as something like

if the last-byte-pos exceeds the sequence length, this is interpreted as a request for a sub-sequence extending to the end of the sequence.

Similarly to how a missing end query parameter is described.

@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented May 29, 2026

IMHO because determining sequence length in advance requires the client to make an extra metadata request round trip, because of the server-needs-to-remember-request-flavour issue Andy mentioned, and for consistency with Range header behaviour — IMHO it would be preferable for the spec to allow either the clipping behaviour or the Not Satisfiable error behaviour when the end query parameter is present and greater than the sequence length.

@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented May 30, 2026

Change out-of-bounds start from 400 Bad Request to 416 Range Not Satisfiable (matches EBI implementation)

There's a case to be made that the originally specified 400 was more appropriate and the EBI implementation's behaviour is incorrect.

The relevant RFCs describe 416 as being specifically for use when a Range header's set of ranges has been rejected. To use 416 in the absence of a Range header is contrary to that text — which may have been the reason for originally choosing 400.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refget Sequences start/end query parameter and range request corner cases

3 participants