Skip to content

Clarify HostSpecifier URI parser compatibility#8459

Closed
bg0d-droid wants to merge 1 commit into
google:masterfrom
bg0d-droid:fix-hostspecifier-uri-host-validation
Closed

Clarify HostSpecifier URI parser compatibility#8459
bg0d-droid wants to merge 1 commit into
google:masterfrom
bg0d-droid:fix-hostspecifier-uri-host-validation

Conversation

@bg0d-droid

@bg0d-droid bg0d-droid commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Clarifies the HostSpecifier class documentation after maintainer feedback.

HostSpecifier is documented as suitable for use in a URI, but it follows the syntactic rules of InetAddresses and InternetDomainName. Those classes intentionally accept some inputs that particular URI/URL parsers may reject unless callers normalize first, such as non-ASCII digits, non-ASCII domain names, or domain labels containing underscores.

This version keeps existing runtime behavior unchanged and documents that compatibility boundary instead of rejecting or canonicalizing additional inputs.

Tests:

  • git diff --check

@google-cla

google-cla Bot commented Jun 2, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@bg0d-droid bg0d-droid force-pushed the fix-hostspecifier-uri-host-validation branch 2 times, most recently from 58f6a14 to bf0b546 Compare June 2, 2026 16:24
@cpovirk

cpovirk commented Jun 8, 2026

Copy link
Copy Markdown
Member

This is a case in which we might have a hard time finding the expertise to figure out what to do here. We have tests that expect both to be able to pass domains with underscores (seen in legacy Amazon S3 buckets, I'm reading) and to be able to pass non-ASCII characters (typically converted to Punycode automatically by browsers). We've also seen (non-HostSpecifier) code that tests with non-ASCII digits.

@bg0d-droid

Copy link
Copy Markdown
Contributor Author

Thanks, that makes sense.

My intent was not to change the broader behavior of InternetDomainName or InetAddresses. I was focusing on HostSpecifier because its class-level contract says the value is suitable for use in a URI, while these inputs can produce a parser differential with java.net.URI#getHost().

I agree that rejecting all non-ASCII domain names may be too broad if callers expect browser-like IDN handling. A narrower option could be to canonicalize IDNs through IDN.toASCII so HostSpecifier still returns a URI-safe host form instead of rejecting them outright.

The underscore case seems harder because java.net.URI does not treat foo_bar.com as a host at all, so preserving underscore compatibility conflicts with the current HostSpecifier URI-suitability wording. If preserving legacy underscore support is preferred, then maybe the safer change is documentation/API guidance rather than changing HostSpecifier.isValid behavior.

Which direction would you prefer for this PR: a narrower IDN canonicalization patch, documentation clarification, or keeping the stricter URI-host validation?

@cpovirk

cpovirk commented Jun 10, 2026

Copy link
Copy Markdown
Member

Thanks. I suspect that we mean "URI" in the non-code-font sense—not the Java class java.net.URI, just the general concept. I could definitely see clarifying that much in the docs.

(Incidentally, we could probably say "URL" instead here; I think that the "URI" terminology may have fallen somewhat out of favor in general, and I think "URI" refers to a more general concept than we need for the specific case of HostSpecifier (though maybe you could argue that mailto:, as a URI but not a URL(?), is also a use case for a HostSpecifier).)

It's possible that it would also be valuable to canonicalize digits to ASCII and/or to canonicalize IDNs. It's a bit tough to predict, though, whether that is more likely to help or to hurt: Maybe someone is storing the result of HostSpecifier.toString() in a database somewhere, in which case any change could hurt and be difficult to undo. At least the people who see java.net.URI reject them can perform the normalization themselves.

So I guess I'd lean toward just the documentation change for now.

@bg0d-droid bg0d-droid force-pushed the fix-hostspecifier-uri-host-validation branch from bf0b546 to 330080a Compare June 11, 2026 07:45
@bg0d-droid bg0d-droid changed the title Reject non-URI-safe hosts in HostSpecifier Clarify HostSpecifier URI parser compatibility Jun 11, 2026
@bg0d-droid bg0d-droid force-pushed the fix-hostspecifier-uri-host-validation branch from 330080a to 583a00c Compare June 11, 2026 07:47
@bg0d-droid

Copy link
Copy Markdown
Contributor Author

Thanks for the direction. I updated the PR to be documentation-only:

  • Restored the existing HostSpecifier runtime behavior.
  • Removed the new rejection tests.
  • Clarified that "URI" is meant in the general sense and that some URI/URL parsers may require normalization for inputs that InetAddresses or InternetDomainName intentionally accept.

I also updated the PR title/body to match the narrower scope. git diff --check, CLA, and check-changes are passing.

@cpovirk cpovirk self-assigned this Jun 11, 2026
@cpovirk cpovirk added type=api-docs Change/add API documentation P2 and removed P3 no SLO labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 package=net type=api-docs Change/add API documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants