Skip to content

Normalize tg://user?id=<username> hrefs so _parseMessageText can resolve them#832

Open
zeynalnia wants to merge 4 commits into
gram-js:masterfrom
zeynalnia:fix/html-mention-parsing
Open

Normalize tg://user?id=<username> hrefs so _parseMessageText can resolve them#832
zeynalnia wants to merge 4 commits into
gram-js:masterfrom
zeynalnia:fix/html-mention-parsing

Conversation

@zeynalnia
Copy link
Copy Markdown

@zeynalnia zeynalnia commented Apr 19, 2026

Summary

Closes #831.

_parseMessageText (in gramjs/client/messageParse.ts) already converts MessageEntityTextUrl entities into mentions when the url field is either:

  • tg://user?id=<digits> (positive numeric user id), or
  • @<username> / +<phone> (username or phone-number reference).

So numeric ids are already handled end-to-end without changes to the HTML parser. What was missing: when HTMLParser.parse encountered <a href="tg://user?id=<username>">…</a>, the resulting TextUrl.url did not match the _parseMessageText regex and the mention was silently dropped.

This change normalizes only that case. When the id portion of tg://user?id= is a valid Telegram username (5-32 chars of [A-Za-z0-9_], at least one letter or number, optionally prefixed with @), the HTML parser rewrites the entity URL to @<username> so _parseMessageText can resolve it. Everything else — numeric ids, negative ids, malformed values, empty values — is left untouched and passed through with the original tg://user?id=… URL.

A short comment in html.ts explains why the numeric branch is intentionally absent.

Test plan

  • npx tsc --noEmit passes
  • All existing __tests__/extensions/HTML.spec.ts tests still pass
  • New tests cover: bare username (alice), @-prefixed username (@alice), extra query params after the username, validation of length / character / @ rules, all-digits ids passed through, negative ids passed through, empty id passed through, and a <strong> nested inside the username href

The HTML parser previously treated tg://user?id=N hrefs as generic
MessageEntityTextUrl, breaking the round-trip with unparse (which
already emits this URL form for MessageEntityMentionName).

Numeric ids — including values larger than 2^53 — are now parsed
into MessageEntityMentionName.userId via big-integer. Non-numeric
ids cannot be represented as a Long and fall back to TextUrl so the
link is preserved. Extra query parameters after the id are tolerated.

Closes gram-js#831
Numeric ids (including negative ones for channels and legacy groups)
are now accepted as MessageEntityMentionName. The numeric check is
done via big-integer's parser with a canonical-form round-trip
(parsed.toString() === rawId), which rejects laundered forms such as
"+123", "0123", "1e5", or whitespace-padded values without needing a
separate regex.

Non-numeric ids that match Telegram's username rules (5-32 chars of
[A-Za-z0-9_], at least one letter or number, with an optional leading
@) are mapped to MessageEntityTextUrl with url = "@<id>". Anything
else preserves the original tg://user?id=... URL so no information is
lost.
Reverts the bigInt try/catch + canonical-form pattern in favor of a
plain /^-?\d+$/ test, which is shorter and equally strict for the
inputs we care about. The bigInt() call is kept only to construct
the userId value once the regex has accepted the string.
@zeynalnia zeynalnia changed the title Parse tg://user?id= mentions as MessageEntityMentionName Normalize tg://user?id=<username> hrefs so _parseMessageText can resolve them Apr 20, 2026
_parseMessageText already turns MessageEntityTextUrl entities whose
url is "tg://user?id=<digits>" or "@<username>" into mentions. The
only gap was the username form of the href, "tg://user?id=<username>"
(with or without an @), which its regex does not match.

The HTML parser now rewrites that one case to "@<username>" so the
existing pipeline picks it up. Numeric, negative, and malformed ids
are passed through unchanged. The numeric and negative branches added
in earlier commits are removed since they were either redundant
(numeric) or unsupported by Telegram (negative).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML parser: tg://user?id=<username> mentions are not normalized for _parseMessageText

1 participant