Open
Conversation
idn-ruby is uses libidn2, which, unfortunately, does not recognize domain names that include emojis such as 🌈🌈🌈.st, even though they are valid and work in pretty much any modern browser. Additionally the JS implementation of twitter-text already does recognize links such as https://🌈🌈🌈.st as valid link entities. Replacing idn-ruby with another ruby gem that impelements the punycode conversion and adding some new validation to the is_valid_domain function allows for https://🌈🌈🌈.st to be returned as a valid link entity, without accepting any other invalid links that are tested in the conformity test suite.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Currently the ruby version of this library does not recognize links to domains that include emojis, even though browsers support those domains. Texts that include "https://🌈🌈🌈.st" will not be accepted as a valid URL. The problem comes from idn-ruby and libidn2, which does not recognize emoji characters as valid for domain names, even though they are registerable and work fine in browsers (after being translated into punycode).
Solution
I replaced idn-ruby with another rubygem that implements the punycode conversion in ruby directly without native dependencies and then added some validation that libidn2 did to pass the conformity test suite again.
Result
Texts that include "https://🌈🌈🌈.st" will now correctly identified as including a link. Note that currently there are more checks in this library that prevent "🌈🌈🌈.st" from being parsed as a link. While I would like to make that work as well, I felt like that would be too big of a change.