Turn tg into a single static binary that an AI agent (or any script) can drive to
automate everyday Telegram work on a personal account: message yourself, list and
triage chats, mark chats as read, reply, upload/download files, search history, and more.
The goal is a broad capability surface — and strong agent ergonomics — delivered as a
fast, dependency-free Go binary built on gotd/td, exposed
as composable subcommands. New capability should land as tg subcommands.
tg today authenticates only as a bot (Config.BotToken, client.Auth().Bot(...),
and app.Before hard-rejects a config without a bot token). Bots cannot:
- access Saved Messages / "message yourself",
- list dialogs (
messages.getDialogsis unavailable to bots), - read arbitrary chat history or mark dialogs as read the way a user can,
- see the chat/folder list, contacts, or most account state.
Every automation use case in the request requires a user session. So user login is
Phase 0 and gates everything else. gotd/td already supports this: auth.NewFlow with a
code-prompt + 2FA handler, and qrlogin for QR-based login.
These cut across every feature and matter as much as the features themselves:
- Structured output. Global
--output json|text(defaulttextfor humans, but agents pass--output json). Every command emits a stable, documented JSON object on stdout. No data on stderr; logs/progress go to stderr only. - Non-interactive by default. Never block on a TTY prompt in agent mode. Auth codes, confirmations, and 2FA come from flags / env / a pre-established session.
- Stable exit codes.
0success, distinct non-zero codes for auth failure, peer not found, flood-wait exceeded, permission denied — so agents can branch on them. - No decorative output in machine mode. The upload progress bar and typing-action
updates must be suppressed under
--output json. - Consistent peer resolution. One shared
--peergrammar everywhere:me/self,@username, numeric ID, phone,t.me/...link. Cache resolved access-hashes in the session dir (StringSession-style sessions have no entity cache — we must keep our own). - Idempotency & safety. Destructive actions (delete history, leave, ban, delete
account-level state) require
--yesin agent mode and are clearly marked read-only vs mutating. - Schema versioning. Include
"schema": 1(or similar) in JSON so agents can adapt. - Account selection. Every command accepts a global
--account <label>(envTG_ACCOUNT), defaulting to the configured default account. Reserve and thread this flag from day one even while only one account exists, so adding multi-account later is additive and never changes existing command signatures.
- Adopt
spf13/cobrafor the CLI (replacing the currenturfave/cli/v2). Do this first — every later command is a cobra subcommand, so migrating after the surface grows is costly. Requirements:- Rich documentation: every command sets
Short, a long-formLong, and runnableExampleblocks; group related commands (cobra.Group) sotg --helpreads as a map of the surface. Generatemanpages and Markdown docs from the command tree (spf13/cobra/doc) and publish them, so help stays in sync with the code. - Thorough autocomplete: ship
tg completion bash|zsh|fish|powershell(cobra's built-incompletioncommand). RegisterValidArgsFunctionfor dynamic completion of the things agents and humans actually type —--peer(cached dialogs/usernames from the peer cache),--accountlabels,--outputvalues, enum flags — and mark file-taking flags (upload/download) withMarkFlagFilename. Completions must work without a network round-trip where possible (read from the local session/peer cache). - Use
cobra.Command.RunE(error-returning) throughout; wire global flags as persistent flags on the root; pair withspf13/pflagfor GNU-style--flag/-fparsing.
- Rich documentation: every command sets
- QR login (primary):
tg logindefaults to QR, followinggotd/td'sexamples/qrauth.go+examples/userbot. Wiring:dispatcher := tg.NewUpdateDispatcher()andloggedIn := qrlogin.OnLoginToken(&dispatcher)— the channel fires ontg.UpdateLoginTokenwhen the code is scanned.client.QR().Auth(ctx, loggedIn, show), whereshowrenders the QR to stderr viagithub.com/mdp/qrterminal/v3and also printstoken.URL()(thetg://login?token=...link) so it can be opened/rendered elsewhere — agent-friendly.- 2FA fallback: on
SESSION_PASSWORD_NEEDED, callclient.Auth().Password(ctx, pwd)(retry onauth.ErrPasswordInvalid). - Requires updates enabled. Today
app.gosetsNoUpdates: trueand registers noUpdateHandler, sologgedInwould never be notified. The login path must register the dispatcher asUpdateHandlerand not disable updates; makeNoUpdatesand the dispatcher conditional on the command. - UX: scan once (Settings → Devices → Link Desktop Device); the session persists in
session.FileStorage, so all later agent invocations are headless.
- Phone login (fallback):
tg login --phoneusingauth.NewFlow(code + 2FA), for environments where scanning a QR is inconvenient. - Persist the user session alongside the existing bot session. Extend
Config/initto support a user session path; relaxapp.Beforeso a bot token is no longer mandatory. - Auth mode selection: allow a single config to hold both; commands pick user vs bot session (most new commands are user-only).
-
--output jsonplumbing: a small result-writer used by all commands; move logs and progress to stderr. - Peer resolver + access-hash cache in the session directory.
- Proxy support (global): a single
--proxyflag (config +TG_PROXYenv) accepting a URL, threaded intotelegram.Options.Resolver:socks5://,socks4://,http(s)://→dcs.Plainwith agolang.org/x/net/proxydialer (already in the module graph viax/net; no new dep). Honor user:password and remote-DNS.tg://proxy?server=&port=&secret=/ MTProxy → nativedcs.MTProxy(addr, secret, …), reusing the link + secret (hex/base64url) parsing fromgotd/td'sexamples/mtproxy-connect.- Wire it at client construction so every command benefits; per-account overrides come in Phase 7.
-
tg whoami: smallest end-to-end user-session command to validate auth.
These directly cover "upload files to myself, list chats, mark chats as read, replies".
-
tg chats list— list dialogs (paged), with unread counts, pinned/muted/archived flags, last message preview. -
tg messages list <peer>/tg history <peer>— read recent messages. -
tg send <peer> <text>— already exists; extend to defaultmeand JSON output. -
tg reply <peer> <message-id> <text>— reply to a specific message. -
tg read <peer>— mark a chat as read. -
tg uploadtome/ Saved Messages — already exists; ensuremepeer + JSON + silent progress. -
tg download <peer> <message-id> [--out path]— download media.
-
edit,delete(single + bulk),delete-history. -
forward(single + multiple). -
pin/unpin/unpin-all/pinned. - Reactions:
react/unreact/reactions. - Drafts:
draft set/drafts/draft clear. - Scheduled messages:
schedulesend / list / delete. - Search:
search <peer> <query>andsearch --global <query>. - Polls:
poll create. - Message context / links:
context,link. - Albums, voice, stickers, GIFs. (album +
upload --type voice|sticker|gif+stickerslist; GIF search deferred — inline-bot flow, low value.)
-
chat get/chat full. -
mute/unmute/archive/unarchive. -
resolve <username>,search-public <query>,subscribe. - Contacts: list / search / add / delete / block / unblock / blocked / import / export.
- Create group/channel, invite, leave, participants, admins, banned.
- Admin/moderation: promote/demote, ban/unban, permissions, slow mode, admin rights, recent actions.
- Edit chat title/photo/about, invite links (export/import/join by link).
- Forum topics.
- Profile: get me, update profile, profile photo set/delete, privacy get/set, user info/photos/status.
(get-me=
whoami, user-info=chat full; privacy get/set deferred — heavy privacy-key/rule modeling, low agent value.) - Folders / dialog filters: list / get / create / add-chat / remove-chat / delete / reorder.
-
tg watch <peer>— stream new messages as JSON lines; the agent's input loop. -
tg wait— block until a new (or settled) incoming message, with timeout. - Backed by
gotd/tdupdate handlers;NoUpdates: true(set inapp.gotoday) must become conditional.
The --account selector (design principle 8) ships from Phase 0, but real multi-account —
several sessions usable, and live simultaneously — lands here: a label -> client map,
an --account <label> selector, and a tg accounts listing.
- Config for N accounts: named accounts in the config (or env
TG_SESSION_<LABEL>), each with its ownsession.FileStoragefile, peer-cache, andfloodwait.Waiter. Keep a backward-compatible single "default" account. -
tg accounts— list configured accounts + auth status. -
tg login --account <label>— mint/refresh a session per label (QR or phone). - Concurrent runtime: hold a
map[label]*telegram.Client, each running under its own waiter; a command resolves--accountto one client. Clients are independent and safe to run in parallel — one flood-wait or reconnect must not stall the others. - Fan-out where it makes sense:
--account all(or repeated--account) for read/broadcast commands (e.g.chats list,send,watch) — results keyed by label in JSON. Strictly opt-in; single-account stays the default. - Realtime across accounts: Phase 6
watch/waitcan observe multiple accounts concurrently (one dispatcher + update loop per client), merged into one JSON stream. - Per-account proxy: per-label override of the global proxy (Phase 0), so each account can route through its own SOCKS5/HTTP/MTProxy.
The surface breaks into groups: accounts, chats, messages, media, contacts,
groups, profile, folders, events. The phases above fold each group in: Phase 1
covers the daily-driver subset of chats/messages/media; Phase 2 the rest of
messages+media; Phase 3 chats+contacts; Phase 4 groups; Phase 5
profile+folders; Phase 6 events; Phase 7 generalizes the whole surface to multiple
concurrent accounts (the accounts group).
- CLI framework:
spf13/cobra(+spf13/pflag), replacingurfave/cli/v2. Chosen for first-class shell completion (bash/zsh/fish/powershellwith dynamicValidArgsFunctionhooks) and doc generation (spf13/cobra/doc→ man + Markdown). One*cobra.Commandper subcommand withRunE; global flags are persistent flags on the root; the existingapp/Beforewiring moves intoPersistentPreRunE. - Library:
gotd/td(telegram,tg,telegram/message,telegram/uploader,telegram/downloader,telegram/queryfor dialog/history pagination,auth,auth/qrlogin,telegram/peersfor resolution/caching). - QR login adds one dependency:
github.com/mdp/qrterminal/v3(terminal QR rendering), matchinggotd/td'sexamples/qrauth.go. The QR flow is also a forcing function to introduce an update dispatcher (tg.NewUpdateDispatcher) and turn off the currentNoUpdates: truefor commands that need it — which Phase 6 (realtime) needs anyway. - Flood-wait is already handled via the
floodwait.Waitermiddleware — keep it; add a distinct exit code when retries are exhausted. - Proxy is configured through
telegram.Options.Resolver(dcs.Resolver), not a global dialer:dcs.Plain{Dial: <x/net/proxy dialer>.DialContext}for SOCKS/HTTP, ordcs.MTProxy(addr, secret, …)for MTProxy. Seegotd/td'stelegram/dcs/example_test.goandexamples/mtproxy-connect. Each account builds its own resolver, so per-account proxies fall out naturally in Phase 7. - Sessions: keep using
session.FileStorage(per-account file); the user session lives beside the existing bot session. Consider an export-to-string command for portability. - Multi-account isolation (Phase 7): one
telegram.Clientper label, each with its own session file, peer-cache, andfloodwait.Waiter— no shared mutable state between accounts, so they run concurrently without one blocking another. The runtime owns amap[label]*telegram.Client; commands select by--account. Single-account remains the default and costs nothing. - Pagination: dialog and history listing must page (
page/page_size); usequery.GetDialogs/query.Messagesiterators. - Output stability: define Go structs for each command's JSON result; never leak raw MTProto types into the contract.
- Login UX for agents: interactive
tg loginonce to mint a session, then agents reuse it headless — or support a fully headless code-delivery path? (Recommend: interactive one-time login, headless reuse.) - Keep bot support? Recommend yes — some flows (channel posting) are fine as a bot — but make user the default for the agent commands.
- Scope of v1: propose shipping Phases 0–1 as the first milestone, since they cover the exact asks (message self, list chats, mark read, reply, upload).
- Multi-account timing: ship single-account first but reserve
--accountfrom Phase 0, so full concurrent multi-account (Phase 7) is purely additive. Should fan-out (--account all) be in scope, or only explicit single-label selection?