Skip to content

PetrovDw/digest-skill

Repository files navigation

digest-skill — Telegram Digest Claude Code Skill

A Claude Code skill that automatically collects posts from Telegram channels, deduplicates them, generates an AI summary, produces a podcast-style audio with two voices, and sends everything to your Telegram bot.

Requires Claude Code CLI. This project is a Claude Code skill, not a standalone app. It is invoked by typing /digest inside Claude Code.

What it does

  1. Collects posts from configured Telegram channels via MTProto (no bot required in channels)
  2. Deduplicates in 3 stages:
    • Forward origin check — detects reposts, keeps the original
    • Text hash — removes near-identical copies
    • Semantic dedup — OpenAI embeddings cluster similar news, keeps one per story
  3. Summarizes — Claude reads the unique posts and writes a structured HTML digest with source links (max 20 top stories)
  4. Podcast audio — generates a two-host dialogue (male + female voices) via ElevenLabs on fal.ai
  5. Sends — posts the HTML digest + audio file to your Telegram bot

Quick start

# 1. Clone into Claude Code skills directory
git clone <your-repo-url> ~/.claude/skills/digest
cd ~/.claude/skills/digest

# 2. Install dependencies
npm ci

# 3. Copy and fill in environment variables
cp .env.example .env
# Edit .env with your API keys (see Environment variables section)

# 4. Authorize Telegram (one-time, generates TG_SESSION)
npm run auth

# 5. Edit config/channels.ts — add your channel list

# 6. Run inside Claude Code
/digest

Installation

1. Get required API keys

Key Where to get
TG_API_ID / TG_API_HASH my.telegram.org/apps — create an app
TG_BOT_TOKEN @BotFather on Telegram
TG_CHAT_ID Your personal chat ID — send a message to @userinfobot
OPENAI_API_KEY platform.openai.com
FAL_KEY fal.ai

2. Authorize Telegram (one time)

npm run auth

A QR code will appear in the terminal. Open Telegram → Settings → Devices → Link Desktop Device and scan it. After success, copy the printed TG_SESSION=... line into your .env.

3. Add channels to monitor

Edit config/channels.ts:

export const CHANNELS: string[] = [
  "durov",
  "bbcnews",
  // add channel usernames without @
];

4. Run manually to test

/digest        # last 12 hours (default)
/digest 6      # last 6 hours

5. Schedule (macOS)

A launchd plist template is included for automatic runs at 9:00 and 21:00:

# Replace <skill-path> with the actual skill path, then:
cp com.digest-skill.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.digest-skill.plist

com.digest-skill.plist is a template file. Replace every <skill-path> placeholder with your local absolute path before loading it, and do not commit a machine-specific edited copy back into the repository.

Environment variables

# Telegram MTProto — get from https://my.telegram.org/apps
TG_API_ID=
TG_API_HASH=
TG_SESSION=                     # generated by npm run auth

# Telegram Bot — for sending the digest
TG_BOT_TOKEN=
TG_CHAT_ID=

# OpenAI — used only for semantic deduplication (embeddings)
OPENAI_API_KEY=

# fal.ai — used for ElevenLabs TTS audio generation
FAL_KEY=

# ElevenLabs voice names for podcast hosts (any voice from elevenlabs.io/voice-library)
ELEVENLABS_VOICE_MALE=Charlie
ELEVENLABS_VOICE_FEMALE=Charlotte

# Digest language: ISO 639-1 code
DIGEST_LANGUAGE=ru              # ru, en, he, ar, de, fr, ...

Usage

Type /digest in Claude Code to run the full pipeline. Claude will:

  1. Collect and deduplicate posts from configured channels
  2. Select the top 20 stories and write an HTML summary + podcast script
  3. Generate audio via ElevenLabs (two-host dialogue)
  4. Send the digest to your Telegram bot

What gets stored in data/

All runtime files are written to data/ (gitignored — never committed):

File Description
posts_raw.json All collected posts before dedup
posts_unique.json Posts after all 3 dedup stages
summary.html Final HTML digest sent to Telegram
summary_plain.txt Podcast script (two-host dialogue)
audio_YYYY-MM-DD.mp3 Generated audio file
audio_path.txt Path to the latest audio file
digest.db SQLite — tracks seen posts across runs

Costs / external services

Service When used Estimated cost
OpenAI Embeddings (text-embedding-3-small) Step 2 — semantic dedup ~$0.001 per run (50–200 posts)
fal.ai / ElevenLabs v3 TTS Step 7 — audio generation ~$0.05–0.15 per run (30–60 segments)
Telegram Bot API Step 8 — sending message + audio Free
Telegram MTProto Steps 1 — reading channels Free

Note: The OpenAI and fal.ai steps cost real money. A typical run is under $0.20, but costs scale with the number of channels and posts.

Voices

Default voices: Charlie (male) and Charlotte (female) — both multilingual ElevenLabs v3 voices that work well across ru/en/he/ar/de/fr.

Any ElevenLabs voice name works — check the full list at elevenlabs.io/voice-library. Prefer multilingual voices for non-English digests.

ELEVENLABS_VOICE_MALE=Charlie
ELEVENLABS_VOICE_FEMALE=Charlotte

Language

Set DIGEST_LANGUAGE to any ISO 639-1 code. Claude will write the summary and podcast in that language, and ElevenLabs will use the correct pronunciation model.

DIGEST_LANGUAGE=en   # English
DIGEST_LANGUAGE=he   # Hebrew
DIGEST_LANGUAGE=ar   # Arabic

Project structure

digest/
├── SKILL.md                  # Claude Code skill entry point
├── config/
│   └── channels.ts           # Telegram channels to monitor
├── src/
│   ├── telegram/
│   │   ├── auth.ts           # One-time QR auth → TG_SESSION
│   │   ├── client.ts         # MTProto session management
│   │   └── collector.ts      # Fetch posts from channels
│   ├── dedup/
│   │   ├── deduplicator.ts   # Stage 1+2: forward_origin + text hash
│   │   └── semantic.ts       # Stage 3: OpenAI embeddings clustering
│   ├── db/
│   │   └── database.ts       # SQLite — post history, dedup tracking
│   ├── audio/
│   │   └── elevenlabs.ts     # Two-voice podcast via fal.ai ElevenLabs
│   ├── bot/
│   │   └── sender.ts         # Telegram Bot API — send message + audio
│   └── types/
│       └── qrcode-terminal.d.ts  # Type declarations
└── data/                     # Runtime files (gitignored)
    ├── posts_raw.json
    ├── posts_unique.json
    ├── summary.html
    ├── summary_plain.txt
    ├── audio_YYYY-MM-DD.mp3
    └── digest.db

How deduplication works

raw posts (e.g. 200+)
  └─► forward_origin check  (reposts → keep original)
  └─► text hash (MD5)       (near-identical text → keep first)
  └─► OpenAI embeddings     (same story, different words → keep one)
        threshold: 0.93 cosine similarity
  └─► unique posts → Claude (top 20 selected for digest)

The SQLite database tracks all seen posts across runs, so each story only appears once across digests.

Troubleshooting

TG_SESSION expired or invalid Re-run npm run auth to generate a new session string.

Channel errors during collection The collector reports failed channels at the end: Failed: @channel — reason. Common causes: private channel, wrong username, account not a member.

Audio generation fails Check your FAL_KEY is valid and has balance. The fal.ai ElevenLabs endpoint requires a funded account.

tsc errors after editing Run npm run check to validate TypeScript before committing.

Digest is too long / too short The skill selects the top 20 stories by default. Adjust channels in config/channels.ts or change the hours-back argument: /digest 6 for a shorter window.

Test without sending to Telegram Comment out Step 8 in SKILL.md temporarily, or set TG_BOT_TOKEN to an empty value — the sender will fail fast without making any API calls.

Requirements

  • Node.js 18+
  • Claude Code CLI
  • macOS or Linux (for launchd / cron scheduling)

License

MIT

About

Claude Code skill for automated news digests from Telegram channels.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors