Clarify HostSpecifier URI parser compatibility#8459
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
58f6a14 to
bf0b546
Compare
|
This is a case in which we might have a hard time finding the expertise to figure out what to do here. We have tests that expect both to be able to pass domains with underscores (seen in legacy Amazon S3 buckets, I'm reading) and to be able to pass non-ASCII characters (typically converted to Punycode automatically by browsers). We've also seen (non- |
|
Thanks, that makes sense. My intent was not to change the broader behavior of I agree that rejecting all non-ASCII domain names may be too broad if callers expect browser-like IDN handling. A narrower option could be to canonicalize IDNs through The underscore case seems harder because Which direction would you prefer for this PR: a narrower IDN canonicalization patch, documentation clarification, or keeping the stricter URI-host validation? |
|
Thanks. I suspect that we mean "URI" in the non-code-font sense—not the Java class (Incidentally, we could probably say "URL" instead here; I think that the "URI" terminology may have fallen somewhat out of favor in general, and I think "URI" refers to a more general concept than we need for the specific case of It's possible that it would also be valuable to canonicalize digits to ASCII and/or to canonicalize IDNs. It's a bit tough to predict, though, whether that is more likely to help or to hurt: Maybe someone is storing the result of So I guess I'd lean toward just the documentation change for now. |
bf0b546 to
330080a
Compare
330080a to
583a00c
Compare
|
Thanks for the direction. I updated the PR to be documentation-only:
I also updated the PR title/body to match the narrower scope. |
Clarifies the
HostSpecifierclass documentation after maintainer feedback.HostSpecifieris documented as suitable for use in a URI, but it follows the syntactic rules ofInetAddressesandInternetDomainName. Those classes intentionally accept some inputs that particular URI/URL parsers may reject unless callers normalize first, such as non-ASCII digits, non-ASCII domain names, or domain labels containing underscores.This version keeps existing runtime behavior unchanged and documents that compatibility boundary instead of rejecting or canonicalizing additional inputs.
Tests:
git diff --check