Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,8 @@
},
{
"name": "che-telegram-mcp",
"version": "1.3.1",
"description": "Telegram MCP Server Plugin — Bot API + 個人帳號 TDLib 全功能存取,28+ 工具,Keychain 密鑰管理",
"version": "1.3.2",
"description": "Telegram MCP Server Plugin — Bot API + 個人帳號 TDLib 全功能存取,28+ 工具,Keychain 密鑰管理。v1.3.2: lock-refused 分支 emit MCP JSON-RPC error envelope,取代通用 -32000",
"author": {
"name": "Che Cheng"
},
Expand Down Expand Up @@ -410,4 +410,4 @@
"category": "development"
}
]
}
}
4 changes: 2 additions & 2 deletions plugins/che-telegram-mcp/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "che-telegram-mcp",
"version": "1.3.1",
"description": "Telegram MCP Server Plugin — Bot API + 個人帳號 TDLib 全功能存取,28+ 工具,Keychain 密鑰管理",
"version": "1.3.2",
"description": "Telegram MCP Server Plugin — Bot API + 個人帳號 TDLib 全功能存取,28+ 工具,Keychain 密鑰管理。v1.3.2: lock-refused 分支 emit MCP JSON-RPC error envelope,取代通用 -32000",
"author": { "name": "Che Cheng" },
"license": "MIT",
"keywords": ["mcp", "telegram", "messaging", "tdlib", "bot", "chat", "macos", "keychain"]
Expand Down
23 changes: 23 additions & 0 deletions plugins/che-telegram-mcp/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.3.2] - 2026-05-22

### Fixed
- `che-telegram-all-mcp-wrapper.sh`: lock-refused branch now emits a JSON-RPC 2.0 error envelope to stdout before exiting, so Claude Code's MCP client parses the human-readable message + structured data instead of seeing only `-32000 Server error`. Envelope carries:
- `error.code: -32000` (JSON-RPC server-defined errors range)
- `error.message: "Another instance of CheTelegramAllMCP is already running (lock held by PID NNNN). Use the existing Claude Code window, or kill the previous wrapper first."`
- `error.data.lockHolderPid: <pid>` (machine-readable lock holder)
- `error.data.recoveryCommand: "pkill CheTelegramAllMCP 2>/dev/null; rm -rf ~/.cache/che-telegram-all-mcp.lock ~/.cache/che-telegram-all-mcp.lock.flock"` (semicolon, not `&&`, so cleanup runs even when no process exists)
- `error.data.docsUrl: https://.../README.md#multi-session-limitation`

The original stderr message is retained for direct-shell debug.

**PR-1b id matching (added 2026-05-22 after empirical verification)**: wrapper reads the first line of stdin (with 2s timeout) to extract the JSON-RPC `initialize` request's `id` field, then emits the response envelope with **matching id**. This was required because empirical two-session reproduction in Claude Code v2.1.148 showed that `id: null` responses (the v1.3.2 first attempt) are not matched to pending `initialize` requests and don't surface in Claude Code's MCP error state. With matching-id (PR-1b), debug-log capture confirmed Claude Code's MCP client correctly parses the envelope + stores the full `error.message` internally.

Stdin extraction uses `jq` when available (preferred) and a bash regex fallback for environments without jq. Handles MCP 1.0 spec id forms: integer, quoted string, or null.

New `test-wrapper-mcp-error.sh` covers 6 cases: happy path / lock refused emits valid JSON / stale-lock self-recovery / recoveryCommand validation / id-matching with initialize request / timeout fallback to null. Resolves [#31](https://github.com/PsychQuant/che-msg/issues/31).

**Known UX gap** (out of plugin scope): Claude Code's `/mcp` short-list UI may display only `-32000` (truncated form) instead of the full message. The full message IS captured in Claude Code's internal MCP error state (verified via `--debug mcp` debug logs) and is available to downstream tool consumers. The display truncation is a Claude Code UI policy concern, not a plugin issue.

### Documentation
- `README.md`: added `## Multi-session limitation` section explaining the TDLib single-instance constraint, the v1.3.2+ human-readable error message, and a recovery cookbook (`pkill CheTelegramAllMCP 2>/dev/null; rm -rf ~/.cache/che-telegram-all-mcp.lock ~/.cache/che-telegram-all-mcp.lock.flock`). Documents the pre-v1.3.2 generic `-32000` symptom for users upgrading.

## [1.3.1] - 2026-05-07

### Fixed
Expand Down
60 changes: 59 additions & 1 deletion plugins/che-telegram-mcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,54 @@ Or just ask naturally:

`get_me`, `get_updates`, `send_message`, `forward_message`, `get_chat`, `get_chat_administrators`, `get_chat_member_count`, `get_chat_member`, `set_chat_title`, `set_chat_description`, `pin_chat_message`, `unpin_chat_message`, `unpin_all_chat_messages`, `ban_chat_member`, `unban_chat_member`, `restrict_chat_member`, `promote_chat_member`, `leave_chat`, `delete_message`, `edit_message_text`, `copy_message`, `send_photo`, `send_document`, `send_video`, `send_audio`, `send_sticker`, `send_location`, `send_poll`, `set_my_commands`, `get_my_commands`, `delete_my_commands`

## Multi-session limitation

`telegram-all` uses [TDLib](https://core.telegram.org/tdlib), which keeps the session in a SQLite database with an exclusive WAL lock — **only one process can hold it at a time**. `telegram-bot` is not affected (Bot API is HTTP-based and stateless, so any number of Claude Code sessions can run it in parallel).

If you have **two or more Claude Code sessions open simultaneously**, only the first session can spawn `telegram-all`. The second session's wrapper detects the lock + refuses to start (preventing TDLib database corruption from concurrent writes).

### What you'll see in v1.3.2+

`/mcp` displays a human-readable error such as:

```
mcp__plugin_che-telegram-mcp_telegram-all: Another instance of CheTelegramAllMCP is already running (lock held by PID 11252). Use the existing Claude Code window, or kill the previous wrapper first.
```

The error envelope also carries `data.recoveryCommand` and `data.docsUrl` (this section) for clients that show structured error data.

### Recovery cookbook

When you need to free the lock for the current session:

```bash
# 1. Kill any running telegram-all binary (single instance assumption — safe).
# Use `2>/dev/null` and `; ` (not `&&`) — the orphan-lock case (where
# you most need recovery) has no process to kill, and `&&` would skip
# the cleanup below. `; ` ensures the lock removal always runs.
pkill CheTelegramAllMCP 2>/dev/null

# 2. Remove BOTH lock variants. macOS without flock uses `.lock` directory;
# Linux with flock uses `.lock.flock` file. The wrapper picks one at
# runtime — recovery should clean both so it works on any platform.
rm -rf ~/.cache/che-telegram-all-mcp.lock ~/.cache/che-telegram-all-mcp.lock.flock

# 3. (Optional) Confirm no stale process holds TDLib DB files.
lsof ~/Library/Application\ Support/che-telegram-all-mcp/tdlib/db.sqlite 2>/dev/null

# 4. Restart Claude Code or run /mcp to reconnect.
```

Step 1 is graceful — the binary handles `SIGTERM` and `wait`s for TDLib to checkpoint the WAL before exiting. Step 2 removes the wrapper's atomic-claim guard (both `.lock` directory for mkdir mode and `.lock.flock` file for flock mode). After both, the next Claude Code session that spawns `telegram-all` will succeed.

### Pre-v1.3.2 symptom

If you're on `che-telegram-mcp` plugin **v1.3.1 or earlier**, the lock-refused branch only wrote to stderr (which Claude Code's MCP transport doesn't surface), so users would just see a generic `-32000 Server error` with no recovery hint. Upgrade to **v1.3.2+** for the human-readable message described above. See [#31](https://github.com/PsychQuant/che-msg/issues/31) for the diagnosis.

### Why we don't auto-clean stale binaries

Killing a TDLib process mid-write can corrupt the database (WAL checkpoint mid-flight). The wrapper deliberately requires manual intervention so the user — who knows whether the other Claude Code session is genuinely abandoned or just backgrounded — makes the destructive call.

## Permissions

This plugin requires:
Expand All @@ -188,10 +236,20 @@ This plugin requires:

## Version

Plugin version: 1.3.0 (currently pins `che-telegram-all-mcp` v0.5.0 + `che-telegram-bot-mcp` v0.5.0 binaries; wrapper auto-upgrades on version mismatch)
Plugin version: 1.3.2 (currently pins `che-telegram-all-mcp` v0.5.0 + `che-telegram-bot-mcp` v0.5.0 binaries; wrapper auto-upgrades on version mismatch)

### Changelog

**1.3.2** (2026-05-22)

- **Lock-refused branch emits MCP JSON-RPC error envelope to stdout** (refs [che-msg#31](https://github.com/PsychQuant/che-msg/issues/31)). When a second Claude Code session tries to spawn `telegram-all` while a stale session still holds the TDLib lock, the wrapper now writes a `{"jsonrpc":"2.0","id":<matches initialize request>,"error":{...}}` envelope to stdout before exit. The wrapper reads the first line of stdin (2s timeout) to extract the JSON-RPC `initialize` request's `id` and responds with the matching id, so Claude Code's MCP client surfaces `error.message` (e.g. `"Another instance of CheTelegramAllMCP is already running (lock held by PID 11252). Use the existing Claude Code window, or kill the previous wrapper first."`) instead of generic `-32000 Server error`. Falls back to `id: null` only when stdin is empty (direct-shell debug). `error.data` carries `lockHolderPid`, `recoveryCommand`, and `docsUrl`.
- **Multi-session limitation README section** documents the TDLib upstream constraint + recovery cookbook + Strategy B/C explicit non-decisions.
- **Recovery cookbook hardened**: `pkill ... ; rm -rf ...` (semicolon, not `&&`) so cleanup runs even when no process exists to kill — the orphan-lock case is exactly when cleanup matters most. Also covers both `.lock` (mkdir mode) and `.lock.flock` (flock mode) paths.

**1.3.1** (2026-05-07)

- **Atomic-claim lock**: wrapper now uses `flock` (Linux) or `mkdir` (macOS fallback) to prevent two simultaneous wrappers from racing the TDLib lock. Second wrapper fail-fast with stderr message instead of silent SIGTERM cross-fire. Stale-lock cleanup removes orphaned locks whose owner PID is dead. New regression test (`test-wrapper-pid.sh` test 9). Resolves [#10](https://github.com/PsychQuant/psychquant-claude-plugins/issues/10).

**1.3.0** (2026-04-28)

- **Auto-upgrade wrappers**: each wrapper now pins a `DESIRED_VERSION` and writes a `~/bin/.${BINARY_NAME}.version` sidecar on install. When the plugin bumps the desired version, the wrapper detects the sidecar mismatch and atomically re-downloads (`.tmp` → `mv`) on next spawn. Source builds in `~/Developer/...` are never auto-replaced. Falls back to the existing binary on network failure (no brick).
Expand Down
94 changes: 94 additions & 0 deletions plugins/che-telegram-mcp/bin/che-telegram-all-mcp-wrapper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,90 @@ if [[ -z "$TELEGRAM_API_ID" || -z "$TELEGRAM_API_HASH" ]]; then
exit 1
fi

# --- MCP-shaped error envelope helpers (#31) ---
# When the atomic-claim lock below refuses startup, Claude Code's MCP
# transport otherwise sees the wrapper exit non-zero with no stdout and
# surfaces a generic "-32000 Server error" to the user. By emitting a
# JSON-RPC 2.0 error envelope to stdout BEFORE exit, Claude Code's MCP
# client can render error.message — turning the opaque -32000 into a
# human-readable instruction.
#
# PR-1b (empirical-driven, 2026-05-22): the v1.3.2 first attempt used
# `id: null` per JSON-RPC 2.0 § 5 ("If there was an error in detecting
# the id... it MUST be Null"). Empirical two-session reproduction in
# Claude Code showed the client drops null-id responses as unmatched
# transport noise and still surfaces generic -32000. Fix: read stdin
# briefly to capture the initialize request's id and respond with
# matching id so the MCP client recognizes the response.
#
# The functions are JSON-safe by construction: the only dynamic values
# (lock holder PID, request id) are either gated by numeric regex
# upstream or extracted via jq / strict bash regex.

# Read first line of stdin (expected: JSON-RPC initialize request) with a
# short timeout, extract the request id. Falls back to "null" if stdin is
# empty, times out, or doesn't contain valid JSON.
#
# Output format mirrors JSON literal: numeric id printed unquoted (e.g.
# `42`), string id wrapped in JSON quotes (e.g. `"abc"`), missing/invalid
# id returns `null`. Caller substitutes this directly into the JSON
# envelope's `"id":<x>` slot.
#
# Timeout is 2s — Claude Code MCP transport typically sends initialize
# within milliseconds of spawning the child process. Longer timeout
# would delay wrapper exit and could push Claude Code into its own
# transport timeout.
read_initialize_id() {
local line=""
local id="null"

if IFS= read -r -t 2 line 2>/dev/null && [ -n "$line" ]; then
if command -v jq >/dev/null 2>&1; then
# jq -c outputs JSON-compact form: number unquoted, string with
# quotes, null as literal `null`. Perfect for direct substitution.
local extracted
extracted=$(printf '%s' "$line" | jq -c '.id' 2>/dev/null || true)
if [ -n "$extracted" ]; then
id="$extracted"
fi
else
# Fallback for environments without jq. Handles integer ids
# and quoted string ids — covers MCP 1.0 spec (id is string,
# number, or null per JSON-RPC 2.0).
if [[ "$line" =~ \"id\"[[:space:]]*:[[:space:]]*([0-9]+) ]]; then
id="${BASH_REMATCH[1]}"
elif [[ "$line" =~ \"id\"[[:space:]]*:[[:space:]]*\"([^\"]+)\" ]]; then
id="\"${BASH_REMATCH[1]}\""
fi
fi
fi

printf '%s' "$id"
}

# Emit JSON-RPC 2.0 error envelope to stdout. owner_pid is the lock holder's
# PID (0 = unknown, e.g. flock branch). request_id is the JSON id from the
# pending initialize, output of read_initialize_id — substituted directly
# into the envelope.
emit_mcp_error_response() {
local owner_pid="${1:-0}"
local request_id="${2:-null}"
local pid_phrase=""
local pid_field="null"
if [[ "$owner_pid" =~ ^[0-9]+$ ]] && [ "$owner_pid" != "0" ]; then
pid_phrase=" (lock held by PID ${owner_pid})"
pid_field="${owner_pid}"
fi
# recoveryCommand uses `;` instead of `&&` because the orphan-lock case
# (most common stuck-state) has NO process to kill — pkill exits 1, which
# would short-circuit `&&` and skip the lock cleanup. Semicolon ensures
# both steps run regardless. Both lock paths are removed: `.lock` (mkdir
# mode) and `.lock.flock` (flock mode), so the same command works on
# macOS (mkdir) and Linux (flock).
printf '{"jsonrpc":"2.0","id":%s,"error":{"code":-32000,"message":"Another instance of CheTelegramAllMCP is already running%s. Use the existing Claude Code window, or kill the previous wrapper first.","data":{"lockHolderPid":%s,"recoveryCommand":"pkill CheTelegramAllMCP 2>/dev/null; rm -rf ~/.cache/che-telegram-all-mcp.lock ~/.cache/che-telegram-all-mcp.lock.flock","docsUrl":"https://github.com/PsychQuant/psychquant-claude-plugins/blob/main/plugins/che-telegram-mcp/README.md#multi-session-limitation"}}}\n' \
"$request_id" "$pid_phrase" "$pid_field"
}

# --- Atomic-claim lock (#10) ---
# TDLib DB is single-instance — two MCP servers can't share it. The previous
# PID-tracking strategy (#8) is racy on multi-window scenarios: window B reads
Expand All @@ -115,6 +199,11 @@ if command -v flock >/dev/null 2>&1; then
LOCK_MODE="flock"
exec 200>"$LOCK_FILE"
if ! flock -n 200; then
# PR-1b: read initialize id from stdin before emitting response, so
# Claude Code's MCP client matches the error to its pending request.
# flock has no caller-visible owner PID, so emit without it.
REQ_ID=$(read_initialize_id)
emit_mcp_error_response 0 "$REQ_ID"
echo "$BINARY_NAME: Another instance is already running. Use the existing Claude Code window, or kill the previous wrapper first." >&2
exit 1
fi
Expand All @@ -132,6 +221,11 @@ else
exit 1
}
else
# PR-1b: read initialize id from stdin before emitting response,
# so Claude Code's MCP client matches the error to its pending
# request and surfaces error.message instead of generic -32000.
REQ_ID=$(read_initialize_id)
emit_mcp_error_response "${OWNER_PID:-0}" "$REQ_ID"
echo "$BINARY_NAME: Another instance is already running (lock held by PID ${OWNER_PID:-?}). Use the existing Claude Code window, or kill the previous wrapper first." >&2
exit 1
fi
Expand Down
Loading