Normalize tg://user?id=<username> hrefs so _parseMessageText can resolve them#832
Open
zeynalnia wants to merge 4 commits into
Open
Normalize tg://user?id=<username> hrefs so _parseMessageText can resolve them#832zeynalnia wants to merge 4 commits into
zeynalnia wants to merge 4 commits into
Conversation
The HTML parser previously treated tg://user?id=N hrefs as generic MessageEntityTextUrl, breaking the round-trip with unparse (which already emits this URL form for MessageEntityMentionName). Numeric ids — including values larger than 2^53 — are now parsed into MessageEntityMentionName.userId via big-integer. Non-numeric ids cannot be represented as a Long and fall back to TextUrl so the link is preserved. Extra query parameters after the id are tolerated. Closes gram-js#831
Numeric ids (including negative ones for channels and legacy groups) are now accepted as MessageEntityMentionName. The numeric check is done via big-integer's parser with a canonical-form round-trip (parsed.toString() === rawId), which rejects laundered forms such as "+123", "0123", "1e5", or whitespace-padded values without needing a separate regex. Non-numeric ids that match Telegram's username rules (5-32 chars of [A-Za-z0-9_], at least one letter or number, with an optional leading @) are mapped to MessageEntityTextUrl with url = "@<id>". Anything else preserves the original tg://user?id=... URL so no information is lost.
Reverts the bigInt try/catch + canonical-form pattern in favor of a plain /^-?\d+$/ test, which is shorter and equally strict for the inputs we care about. The bigInt() call is kept only to construct the userId value once the regex has accepted the string.
_parseMessageText already turns MessageEntityTextUrl entities whose url is "tg://user?id=<digits>" or "@<username>" into mentions. The only gap was the username form of the href, "tg://user?id=<username>" (with or without an @), which its regex does not match. The HTML parser now rewrites that one case to "@<username>" so the existing pipeline picks it up. Numeric, negative, and malformed ids are passed through unchanged. The numeric and negative branches added in earlier commits are removed since they were either redundant (numeric) or unsupported by Telegram (negative).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #831.
_parseMessageText(ingramjs/client/messageParse.ts) already convertsMessageEntityTextUrlentities into mentions when theurlfield is either:tg://user?id=<digits>(positive numeric user id), or@<username>/+<phone>(username or phone-number reference).So numeric ids are already handled end-to-end without changes to the HTML parser. What was missing: when
HTMLParser.parseencountered<a href="tg://user?id=<username>">…</a>, the resultingTextUrl.urldid not match the_parseMessageTextregex and the mention was silently dropped.This change normalizes only that case. When the id portion of
tg://user?id=is a valid Telegram username (5-32 chars of[A-Za-z0-9_], at least one letter or number, optionally prefixed with@), the HTML parser rewrites the entity URL to@<username>so_parseMessageTextcan resolve it. Everything else — numeric ids, negative ids, malformed values, empty values — is left untouched and passed through with the originaltg://user?id=…URL.A short comment in
html.tsexplains why the numeric branch is intentionally absent.Test plan
npx tsc --noEmitpasses__tests__/extensions/HTML.spec.tstests still passalice),@-prefixed username (@alice), extra query params after the username, validation of length / character /@rules, all-digits ids passed through, negative ids passed through, empty id passed through, and a<strong>nested inside the username href