Cmochance · Cmochance · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/README.en.md b/README.en.md
@@ -91,6 +91,7 @@ Injects a standalone "Usage" section at the bottom of Codex Desktop's "Toggle pi
 - **Injected system prompts follow the UI language**: the `apply_patch` chat-path rules + autocompact summarization prompt that this project injects for non-OpenAI providers track the `语言 / Language` setting (Chinese users → Chinese prompts, avoiding mixed-language model thinking); V4A keywords (`*** Begin Patch` / `@@ <header>` etc.) + Codex CLI error message originals stay in English (parser / matcher does not accept translations)
 - **Codex Desktop Theme (optional, off by default)**: Theme page ships 11 built-in anime themes (`carton` with a floating mascot, plus `changli` / `azurlane` / `nailin` / `zani` / `frost` / `nocturne` / `duet` / `rose` / `sonata` / `studio`), each individually colour-matched to its artwork (per-theme glass + accent). Injects design-token overrides (`--color-token-*` + the runtime `--color-*` layer) + a background image into Codex Desktop via CDP, covering chat / settings / collapsed-sidebar / popovers. Toggle is independent from Plugin Unlock; page reload re-applies automatically; disabling the toggle only clears the saved preference — any already-injected theme stays until the next Codex restart
 - **Usage panel inside Codex Desktop (optional, off by default)** (MOC-204): Settings → "Show usage in Codex" injects a collapsible "Usage" section at the bottom of Codex's "Toggle pinned summary" popup (the panel that contains Environment / Sources sections), showing up to 4 rows: ① **5-hour quota / weekly quota** — whitelisted providers only: **antigravity gemini series** reads from `cloudcode-pa.googleapis.com/v1internal:retrieveUserQuotaSummary` (dual-window 5h + weekly, remaining% = remainingFraction×100); **GLM Coding Plan** (`bigmodel.cn` / `z.ai` coding hosts) reads from `monitor/usage/quota/limit` (apiKey auth, no Bearer prefix), returning 5h + weekly TOKENS_LIMIT records, converted as remaining% = 100 − usage%; **Xiaomi MiMo Token Plan** (`platform.xiaomimimo.com`) shows a monthly-plan remaining% progress bar — the plan quota is only accessible via a MiMo web session (httpOnly cookie), so you must click "Sign in to Xiaomi account" in the provider edit page first: the app opens an embedded webview for login, captures the session cookie, and the daemon uses it to query `/api/v1/tokenPlan/usage`; **DeepSeek** (`api.deepseek.com`) shows a ¥X balance numeric entry, read from the official `/user/balance` endpoint using the same API key (Bearer); **Kimi (月之暗面 / Moonshot PAYG, `api.moonshot.cn` / `.ai`)** shows balance numeric entries (available / cash / voucher, ¥/$ by host), read from the official `/v1/users/me/balance` using the same key (Bearer) — **note: the subscription-based `kimi-code` (`api.kimi.com/coding`) is a separate provider with no balance endpoint and is excluded**; **anyrouter** (`api.anyrouter.top`) shows a $X used-amount numeric entry, read from `/v1/dashboard/billing/usage` using the same key (Bearer; remaining balance is blocked by upstream anti-scraping so only the used amount is shown). Whitelist is determined by baseUrl host. Red warning ≤10% + reset time shown. Quota rows appear only when the active provider matches a whitelisted host; all others show no quota rows. ② **Context** — injected JS reads `contextUsage.usedTokens` + `contextWindow` directly from Codex's React fiber, available immediately for any existing conversation without a new turn; full window = contextWindow ÷ 0.95 (adds back the 5% reserve Codex hides); 1M models display "1M" not "1000k". ③ **Tokens (real-time rate · cumulative)** — rate estimated by a MutationObserver watching Codex's streaming text (2s sliding window, CJK-aware); cumulative total from Codex rollout. ④ **Cache hit rate** — from rollout cached_input/input. **③④ and the rate are all isolated per active conversation (MOC-230)**: injected JS reads the current `conversationId` from the React fiber and the daemon keys totals to that conversation's rollout (== filename uuid, not the most-recently-modified file), following conversation switches; shows "—" (never another conversation's data) when the id / its rollout can't be resolved. The "Usage" title is collapsible (chevron + localStorage-persisted). Injection uses periodic CDP pushes; re-attaches automatically after a Codex page reload or restart. Requires launching Codex through this app; restart Codex after toggling if already running.
+- **Codex mobile remote control (optional, off by default)** (MOC-249, M1): Settings → "Mobile remote control (Telegram)", fill in a Telegram Bot Token + an allowed-users whitelist (numeric user ids, comma-separated; only numeric ids are honored for security — usernames are mutable/reassignable and unsupported). When enabled, transfer runs a Telegram bot (pure HTTPS long-poll — no relay / public callback): authorized users message the bot from their phone to remotely drive the Codex launched by this app — transfer injects the prompt into Codex's composer (ProseMirror) over CDP, submits, and streams the reply back into the Telegram message. Commands: `/new` (new chat), `/stop` (stop current turn), `/status`, `/help`; plain text = one prompt turn. **⚠️ Remote control is equivalent to operating this machine remotely — only authorize your own account and keep the Bot Token safe**; requires launching Codex through this app (the toggle affects the debug port), restart Codex after toggling. M1 is conversational (if Codex needs to approve a command/tool, confirm on the desktop; approval-relay-to-phone is a later phase). The Windows Store (MSIX) build is unsupported due to the debug-port passthrough gap.
 - **System-proxy (VPN/ladder) connectivity detection** (MOC-114): the dashboard "Network Proxy" card shows live status — connected / disconnected / PAC auto-config / detecting. In relay real-account mode, the "Auto-unlock Codex Plugins" toggle gates on both conditions being met (valid account AND proxy reachable), preventing the silent-failure state where plugins spin and return 502s while the UI shows "logged in" because the proxy is down. Detection uses a short-timeout TCP connect to the proxy port only; chatgpt.com is never contacted.
 - **Built-in web fetch tool (web_fetch, MOC-144)**: Settings → "Built-in web fetch backend" — select `auto` (recommended; **defaults to `auto` since MOC-215, works out of the box** — new users get web_fetch / web_search without manually enabling it; web_fetch uses curl/wreq and needs no Chrome, web_search is still gated on Chrome readiness and never silently downloads) / `curl` / `wreq` / `headless` (**independent of** the Codex sandbox network toggle). Transfer automatically registers a `web_fetch` MCP tool with Codex, which the model can call directly to fetch web pages — `curl` uses standard HTTP, `wreq` bypasses Cloudflare TLS challenges, `headless` drives a headless Chrome to retrieve JS-rendered DOM (first-time headless use prompts to download chrome-headless-shell, ~86 MB, if Chrome is not installed). Beyond the three fetch backends, `web_fetch` also follows **HTML `meta refresh` / JS `location` redirects** (re-fetches the target URL, loop-protected to 3 hops) — curl/wreq/headless only follow HTTP 3xx and do not handle these client-side redirects; "placeholder" redirect pages (e.g. pages that bounce around Twitter/Substack blocks) are now automatically followed to the real destination (MOC-139). **`auto` tier (MOC-161)**: automatically escalates from curl → wreq → headless based on page-difficulty signals; remembers the last successful tier per origin so subsequent requests start there; downgrades to curl when no system proxy is reachable (wreq / headless rely on a proxy); first use of the headless tier still confirms the Chrome download. Switching tiers takes effect immediately (no restart needed); **toggling the feature on or off requires restarting Codex Desktop** for the network tools (web_fetch / web_search / read_url_local) to appear / disappear in Codex (since MOC-235 the MCP server stays registered to host `read_tool_artifact`; turning the network backend off just stops exposing those network tools rather than unloading the whole server). Fetched HTML is auto-converted to markdown before returning to the model (cleaner, fewer tokens; non-HTML responses pass through unchanged), and headless waits for networkIdle before capturing the rendered DOM (MOC-145). Headless fetches run with anti-detection stealth (strips `navigator.webdriver`, fakes `window.chrome`/plugins/WebGL, removes the `HeadlessChrome` UA token), passing passive-fingerprint / simple JS-challenge Cloudflare; interactive Turnstile/DataDome managed challenges still won't pass (MOC-152). On a CF JS-challenge page, headless now **waits in place for it to auto-clear** before reading (instead of returning the challenge page as content), and **persists the browser profile per origin** to reuse CF clearance cookies — a second fetch of the same site skips the repeat challenge and is faster (MOC-156). Before markdown conversion the page goes through **main-content extraction** (readability algorithm strips nav/header/footer/sidebar/ads, keeping only the article so large-page content is no longer crowded out by truncation; non-article pages fall back to the full page); **binary resources** (image / video / audio / PDF) and files over 16 MB are not downloaded and return a clear notice instead (no more garbage bytes / OOM) (MOC-152). `web_fetch` **returns the full extracted page text by default** (the current turn's tool output goes into the LLM context in full; the adapter layer automatically compresses older tool outputs to prevent context overflow; MOC-190) — no more pagination, no `offset` paging, no relevance-based `query` chunk selection, so precise content (code / schema / version numbers / figures) is never lost. If you fetched a URL earlier in the conversation and its content has since been folded/compressed in the context history, use **`read_url_local(url)`** to pull the full text from the in-process cache without re-fetching (cache TTL: 15 min). **More generally, when any tool's large output (shell / Feishu and other MCP / etc.) gets folded into a `[Tool output stored outside model context]` summary in history, the summary includes an `Artifact ID`, and the model can call `read_tool_artifact(artifact_id)` to retrieve that output's text** — read from the shared `tool_artifacts.db` (SQLite WAL, cross-process) that the proxy persists when compressing, so the model never re-runs a tool just to see history again; the retrieved content is visible only in the current turn and gets folded again next turn (no long-term context bloat); outputs over 90k chars are returned in pages (each below the proxy keep-full cap, with a trailer telling the model to page via `offset`) (MOC-235). These tools (`web_fetch` / `web_search` / `read_url_local` / `read_tool_artifact`) declare `readOnlyHint` (read-only), so Codex's auto-review guardian **skips approval** for them (`requires_mcp_tool_approval` short-circuits on the read-only hint) — network calls no longer incur a per-call risk-approval round-trip, removing that latency (MOC-172).
 - **Built-in web search tool (web_search, MOC-12)**: when the built-in web fetch backend is on (non-off) and the machine has Chrome ready, transfer registers a `web_search` tool with Codex — the model passes a query string and gets back a structured list of results (title + real URL + snippet), forming a **two-step search**: `web_search` to find sources, then `web_fetch` to read content, eliminating the need to guess URLs. **Why this matters**: Codex sends an OpenAI server-side `web_search` tool each turn, but third-party chat providers (MiniMax / DeepSeek / GLM / Kimi, etc.) don't support it — the adapter drops it, leaving the model to scrape search engines or guess URLs (real-world success rate ~17%). This tool queries **DuckDuckGo + Bing in parallel and merges the results, deduped by normalized URL** (no API key required, data-centre / VPN-exit IP friendly; the two indexes complement each other so single-call coverage is noticeably broader than a single source, MOC-215; previously Bing was only a fallback when DDG failed, MOC-186), and **always uses headless** internally — DDG / Bing block plain HTTP with anti-bot challenges regardless of TLS fingerprint, so a real browser is required; the parallel fetch keeps wall-time ≈ the slower single engine rather than the sum, and either engine being blocked / empty still leaves the other usable. `web_search` always uses headless internally, but its **exposure / invocation only requires Chrome to be ready** (system Chrome / Edge / Chromium, or an already-downloaded built-in chrome-headless-shell) — decoupled from the web_fetch tier: users with system Chrome can use search under any non-off tier (incl. curl / wreq) without triggering a download; if neither is present it stays hidden and a call returns a hint to pick the headless tier to complete the first-time download (MOC-190). Ad results are filtered out; blocked / no-results states return explicit error messages (never silently empty). **Pagination (MOC-215)**: `web_search` returns only the first page (~10-20 results, not fetching multiple pages at once to avoid excessive headless latency); when the model needs more / different sources it uses the separate **`web_search_more`** tool (same query, `page=2/3…`) to fetch the next batch (via Bing's `first=` deep pages), with a tail hint in the result steering the model to paginate rather than re-run the same query — numeric string arguments are parsed leniently (models often send `page` as the string `"2"`) so pagination never silently falls back to page 1. DDG HTML parsing borrows from `duckduckgo_search` (Python).