-
Notifications
You must be signed in to change notification settings - Fork 9.2k
v3.3: describe behaviour when deserializing a multipart with a Content-Type header #5391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3.3-dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1729,6 +1729,8 @@ This table is based on the value to which the Encoding Object is being applied a | |
| Note that in the case of [Encoding By Name](#encoding-by-name), this value is the array item for properties of type `"array"`, and the entire value for all other types. | ||
| Therefore the `array` row in this table applies only to array values inside of a top-level array when encoding by name. | ||
|
|
||
| When deserializing a multipart message, if a `Content-Type` header is present in the specific part, its value shall be used instead of these values. Behavior is undefined when both this header and `contentType` are defined but with different values. | ||
|
|
||
| | `type` | `contentEncoding` | Default `contentType` | | ||
| | ---- | ---- | ---- | | ||
| | [_absent_](#working-with-binary-data) | _n/a_ | `application/octet-stream` | | ||
|
|
@@ -1864,7 +1866,7 @@ Note that there are significant restrictions on what headers can be used with `m | |
|
|
||
| ##### Handling Multiple `contentType` Values | ||
|
|
||
| When multiple values are provided for `contentType`, parsing remains straightforward as the part's actual `Content-Type` is included in the document. | ||
| When multiple values are provided for `contentType`, parsing remains straightforward as the part's actual `Content-Type` is included in the document; it SHOULD match one of the media types in `contentType`, if it is present. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that with the re-ordered sentences above, this is not necessary as it is simply a consequence of the other statement. It's not entirely clear to me that we can put a SHOULD on the API behavior, though. I think we can only put SHOULDs on our side of things. We should maybe look and see if we have other SHOULDs that impact API behavior rather than OAS tool behavior — If we already do, then I'm perfectly happy for us to add another.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean? What is "our side"?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but anyway, yes, I agree with removing this addition, as it is unneeded; of course we want the Content-Type header to match encoding/contentType, but if they don't there's nothing useful the deserializer can do about it now (as above, generating an error about this is not helpful assuming we were able to deserialize anyway). (Also, I feel good that the original statement here is consistent with my assumption that the Content-Type header is the value we should prefer over anything else) :)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "our side" -> OAS tooling, the stuff that implements our requirements As you can tell, I'm thinking through this as I'm typing, so don't take it as any sort of settled law here.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's definitely room for a philosophical discussion here as to what we consider our scope and what we expect the OAD to actually prescribe. I've been coming at things mostly from a "deserialize incoming requests and outgoing responses, and validate as much as we can" perspective, as that aligned with my previous JSON Schema work, but there's a lot more than that (and once I exhaust myself with deserializing, I'll be pivoting to thinking more about serializing and client generation).. but other tools have done a lot more various things with the OAD and perhaps made some wildly different interpretations on some points.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. one more thought: I do think we should call out the mismatch, but maybe an actual error (which in my tooling would cause a request to be rejected with an HTTP 400 error) might be too severe? It's an interesting distinction between "we were able to deserialize this message completely using the data provided in the message, AND its deserialized form validated against the schema" and "something about the message's structure didn't match what the OAD said it should do".
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about 409 ? |
||
|
|
||
| For encoding and serialization, implementations MUST provide a mechanism for applications to indicate which media type is intended. | ||
| Implementations MAY choose to offer media type sniffing ([[SNIFF]]) as an alternative, but this MUST NOT be the default behavior due to the security risks inherent in the process. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karenetheridge if I am reading this right, the only potential change here is in the case where
contentTypeis absent and therefore assumed to have a default value, but the per-partContent-Typetype contradicts that default value?I think we can make this fit the compatibility guidelines more cleanly by switching the order of the sentences and changing "shall" to "SHOULD" (or "RECOMMENDED" which means the same thing but fit better in the sentence I'm proposing):
Something like this? But less awkward with the last part?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I am not presuming that the default value would be considered here at all -- I think the Content-Type header in the message part should be used in preference to that: a value being underspecified means that we do the best with whatever information we do have, and in this case we have a header telling us what type the message sender intends it to be.
Your edit sounds fine, except I would remove the last sentence: the schema may be just fine with the Content-Type (consider the case where encoding/contentType says "text/plain; charset=UTF-8" but the Content-Type header says "text/plain; charset=Shift_JIS"... or application/yaml instead of application/json).
The logic I'm using is:
_charset_message part.However, while looking at my test cases a little more closely, I'm wondering if it might be more sensible to simply prefer Content-Type over encoding/contentType without erroring at all when the values conflict. contentType is useful as a recipe for serializing a message, and a hint for how to deserialize when insufficient information is present in the message itself, but when Content-Type is right there in the message next to the data, surely we should just go ahead and use it? That's what an application would do in the absence of an OAD telling them what the message should look like. If the data decodes to the desired result, why should this be an error at all?
[...and after experimenting with tests, I am convinced this is correct.. it would not be useful to generate an error when we're still able to successfully deserialize the message. The schema is still there as a verification step to catch issues with the deserialized content, and there's a good chance that it came out correctly anyway even if the media-type didn't match what the OAD said the serialization process ought to be using. If a media-type was used that we don't know how to decode, then we can error on that instead.]
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edited to fix some errors; read in web rather than in email)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karenetheridge
It might be more sensible (in some cases I'm pretty sure it is more sensible, and it definitely fits more with the architecture of the web where you can have advance hints, but the runtime behavior is always what's real), but it's not the way the OAS is written (or more broadly, the way it is marketed as a way to define an API interface contract).
Our general paradigm is that if the runtime behavior doesn't match the OAD, then that's an error [EDIT: except I keep forgetting the "open world" approach, so maybe it's not? See below]. Whether it's an error in the API or the OAD depends on the circumstances, but the general idea is that you should be able to tell when your API and its design contract fail to match. In theory, whatever the OAD says that contradicts the
Content-Typeis an assertion that the application depends on, so if the OpenAPI tooling to ignores that mismatch, it is likely to cause problems downstream — precisely the sort of problem that a contract is supposed to prevent.(note: I just like em-dashes and know how to use them. No AI was used on this comment 😛 )
This is a philosophical point that would be worth revisiting in a 4.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do about this? Should we add in 3.3 some stricter language that says that
contentType(or one of the values, if comma-separated) MUST match the part's Content-Type, only to potentially remove it in 4.0? Or just leave it vague for now?In my testing I also flagged another edge case:
contentTypesaystext/plain, but the Content-Type header saystext/plain; charset=Shift_JIS. IMO this is not a mismatch; the header is simply being specific about the character encoding, andcontentTypeis saying "any text/plain type will do".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're definitely correct about the
text/plainnot being a mismatch.Regarding the rest of it... I have to apologize for shifting my views a bit again. I'm even frustrating myself with this, and it's a good bit of evidence for why we should really nail down the philosophy of the OAS more clearly (which various folks here have attempted to do over the years but we've never quite managed it).
This scenario is nearly analogous to matching (or not) the keys under the
contentfield of a Response Object (as that one specifically allows media ranges). We don't consider failure to match there an error, because many things in OpenAPI take an "open world" view (which is what I was forgetting before): an OAD describes things that can happen, and if they happen, what they should look like. But other things can also happen, and are not necessarily an error. So if you get backapplication/xmlbut don't have acontentkey for that in the Response Object, it's not automatically an error. It's just a thing you have no information about.Or at least, this is how it should work in an "open world" view. But we're not very clear about that being the view, I think.
It's not quite perfectly analogous because at the Response Object level the
contentkeys can be related to theAcceptrequest header, and there's no way to specify individual part media types inAccept: multipart/whateverAFAIK. And the server is allowed to disregardAcceptanyway.So if we look at
contentTypeas analogous to the Response Object (or Request Body Object)'scontentkeys, I guess a failed match is not an error? It's just a thing we don't know much about?I feel like I've gotten myself confused at this point 😵💫
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, that hasn't been my interpretation -- in my implementation, the way to get past a "wrong Content-Type" error is to explicitly provide a media-type entry for "*/*" (which matches anything that didn't match anything else).
I also error if there is no defined decoding for the media-type that matched under
content, unless the schema is empty ({}ortrue) or missing entirely -- in these cases the string value of the content is returned without alteration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we're not very clear about it- obviously I forgot in this thread! But I found the big conversation about this from a while ago which is interesting.
Wouldn't that need to be
*/*? I didn't thin/on its own was a valid media type or media range?The thing is, we're not all that clear on how this is supposed to work. That could be a good or bad thing. It's a good thing in that tools can be set up for a variety of use cases. "Fail-fast" is good for some use cases, while "if you can more-or-less make it work, keep going" is good for others.
I don't know what we should do here- let's talk about it in the Thursday meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was
*/*- markdown ate the *.