Clarify start/end query parameter and Range header edge cases#113
Clarify start/end query parameter and Range header edge cases#113nsheff wants to merge 1 commit into
Conversation
- Allow start or end query parameters to be used independently - Change out-of-bounds start from 400 to 416 (matches EBI) - Add explicit 416 for out-of-bounds end - Align Range header behavior with RFC 7233 (clip last-byte-pos, reject first-byte-pos > length with 416) Fixes #107
andrewyatz
left a comment
There was a problem hiding this comment.
Agreed that codification of the spec is right here and I had not paid attention to RFC 7233 closely enough that one that exceeds length is clipped. Implementations will need to remember if they have been given a Range header vs args but that should be simple to code around.
I was worried that the reference Python implementation I wrote would fail this but on first read it looks to handle this.
| |-----------|-------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. The server MUST respond with a `Bad Request` error if start is specified and is larger than the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. | | ||
| | `end` | 32-bit unsigned integer | Optional | The end position of the range on the sequence, 0-based, exclusive. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if end and the Range header are both specified. | | ||
| | `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. Either `start` or `end` may be specified independently; if only `start` is specified, the server returns the sub-sequence from `start` to the end of the sequence. The server MUST respond with a `Range Not Satisfiable` error if start is specified and is larger than the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. | |
There was a problem hiding this comment.
| | `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. Either `start` or `end` may be specified independently; if only `start` is specified, the server returns the sub-sequence from `start` to the end of the sequence. The server MUST respond with a `Range Not Satisfiable` error if start is specified and is larger than the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. | | |
| | `start` | 32-bit unsigned integer | Optional | The start position of the range on the sequence, 0-based, inclusive. Either `start` or `end` may be specified independently; if only `start` is specified, the server returns the sub-sequence from `start` to the end of the sequence. The server MUST respond with a `Range Not Satisfiable` error if start is specified and is greater than or equal to the total sequence length. The server MUST respond with a `Range Not Satisfiable` error if start and end are specified and start is greater than end and the sequence is not a circular chromosome. Otherwise if the server does not support circular chromosomes it MUST respond with `Not Implemented` if the start is greater than the end. The server MUST respond with `Bad Request` if start and the Range header are both specified. | |
| | Parameter | Data Type | Required | Description | | ||
| |-----------|-------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. The server MUST respond with a `Bad Request` error if one or more ranges are out of bounds of the sequence. | | ||
| | `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. Per RFC 7233, if the `last-byte-pos` exceeds the sequence length, the server MUST clip it to the sequence length; if the `first-byte-pos` exceeds the sequence length, the server MUST respond with `Range Not Satisfiable`. | |
There was a problem hiding this comment.
| | `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. Per RFC 7233, if the `last-byte-pos` exceeds the sequence length, the server MUST clip it to the sequence length; if the `first-byte-pos` exceeds the sequence length, the server MUST respond with `Range Not Satisfiable`. | | |
| | `Range` | string | Optional | Range header as specified in [RFC 7233](https://tools.ietf.org/html/rfc7233#section-3.1), however only a single byte range per GET request is supported by the specification. The byte range of the sequence to return, 0-based inclusive of start and end bytes specified. The server MUST respond with a `Bad Request` error if both a Range header and start or end query parameters are specified. Per RFC 7233, if the `last-byte-pos` exceeds the sequence length, the server MUST clip it to the sequence length; if the `first-byte-pos` is greater than or equal to the sequence length, the server MUST respond with `Range Not Satisfiable`. | |
There was a problem hiding this comment.
Also I'd suggest avoiding explaining what clipping means by phrasing this as something like
if the
last-byte-posexceeds the sequence length, this is interpreted as a request for a sub-sequence extending to the end of the sequence.
Similarly to how a missing end query parameter is described.
|
IMHO because determining sequence length in advance requires the client to make an extra metadata request round trip, |
There's a case to be made that the originally specified 400 was more appropriate and the EBI implementation's behaviour is incorrect. The relevant RFCs describe 416 as being specifically for use when a Range header's set of ranges has been rejected. To use 416 in the absence of a Range header is contrary to that text — which may have been the reason for originally choosing 400. |
Summary
Addresses #107 by clarifying edge cases for
start/endquery parameters and Range headers.Changes:
startorendquery parameters to be used independentlystartfrom 400 Bad Request to 416 Range Not Satisfiable (matches EBI implementation)endlast-byte-posif it exceeds sequence lengthfirst-byte-posexceeds sequence lengthTesting
Tested against EBI implementation. See test script and issue comment for details.
Note: EBI has a bug where Range header with
first-byte-pos > lengthreturns 200 with full sequence instead of 416. This is not tested by the compliance suite.Fixes #107