Memory Bot — YouTube Channel AI Agents

Project Overview

YouTube kanallari uchun shaxsiy AI agentlar platformasi. Har bir agent o'z kanaliga mos xarakter, bilim va xotiraga ega bo'ladi. LangChain.js + DeepSeek asosida qurilgan.

Tech Stack

Runtime: Bun (NOT Node.js)
Language: TypeScript (strict mode)
AI Framework: LangChain.js (langchain, @langchain/core, @langchain/openai)
Memory: mem0 (mem0ai/oss) — persistent AI memory with pgvector backend
LLM: DeepSeek via @langchain/openai (OpenAI-compatible API)
Embeddings: OpenAI text-embedding-3-small (raw fetch + mem0)
Database: PostgreSQL via Bun.sql (NOT SQLite, NOT pg, NOT postgres.js)
Server: Bun.serve() (NOT express)
Testing: bun test

Architecture

src/
├── index.ts                    # Entry — migrations + server start
├── server.ts                   # Bun.serve konfiguratsiyasi
├── routes/
│   ├── index.ts                # Route registry
│   ├── channel.routes.ts       # Channel endpoints
│   ├── transcript.routes.ts    # Transcript endpoints
│   ├── search.routes.ts        # Semantic search endpoints
│   ├── chat.routes.ts          # AI chat endpoint
│   ├── digest.routes.ts        # Smart Digest endpoints
│   └── content-gap.routes.ts   # Content Gap Finder endpoints
├── services/
│   ├── channel.service.ts      # Business logic (DB-first, yt-dlp fallback)
│   ├── channel.helpers.ts      # Channel resolution utilities
│   ├── transcript.service.ts   # Transcript fetch + summarize
│   ├── search.service.ts       # Vector search orchestration
│   ├── chat.service.ts         # AI agent chat orchestration
│   ├── digest.service.ts       # Smart Digest generation + caching
│   └── content-gap.service.ts  # Content Gap analysis orchestration
├── agent/                      # LangChain ReAct agent
│   ├── create.ts               # Agent factory (per-channel, cached)
│   ├── tools.ts                # Agent tools (list_videos, get_transcript, semantic_search, get_channel_info)
│   └── prompt.ts               # System prompt builder (with mem0 context injection)
├── memory/                     # mem0 persistent memory layer
│   ├── config.ts               # mem0 configuration (DeepSeek LLM + pgvector)
│   └── client.ts               # Memory operations (save, recall, list)
├── ai/                         # Raw AI API calls (metadata, summarization)
│   ├── client.ts               # DeepSeek API client
│   ├── generate-metadata.ts    # AI channel metadata generation
│   ├── generate-digest.ts      # AI weekly digest generation
│   ├── extract-topics.ts       # AI topic extraction from clusters
│   ├── analyze-gaps.ts         # AI content gap analysis
│   └── summarize.ts            # Transcript summarization
├── vector/                     # Embeddings & semantic search
│   ├── embedder.ts             # OpenAI embeddings API
│   ├── chunker.ts              # Semantic text chunking
│   ├── ingest.ts               # Chunk → embed → store pipeline
│   ├── store.ts                # pgvector storage & search
│   └── cluster.ts              # K-means clustering of embeddings
├── repositories/               # Data access layer (1 file per table)
│   ├── channel.repo.ts
│   ├── video.repo.ts
│   ├── transcript.repo.ts
│   ├── metadata.repo.ts
│   ├── digest.repo.ts
│   └── content-gap.repo.ts
├── db/
│   ├── connection.ts           # Bun.sql PostgreSQL connection
│   ├── migrate.ts              # Auto migration runner
│   └── migrations/             # Sequential .sql files (001_, 002_...)
├── lib/
│   ├── errors.ts               # AppError hierarchy
│   ├── response.ts             # { ok, data } / { ok, error } envelope
│   ├── request.ts              # Query param parsing
│   ├── enums.ts                # Channel categories, languages
│   └── niche-topics.ts         # Per-category reference topic lists
└── yt/                         # YouTube external service layer
    ├── types.ts
    ├── parse-channel.ts
    ├── fetch-videos.ts
    ├── fetch-transcript.ts
    └── fetch-channel-images.ts

Layered Architecture

Routes → Services → Repositories → Database
                  → Agent module → LangChain (createAgent + tools)
                  → Memory module → mem0 (pgvector + DeepSeek)
                  → AI module   → DeepSeek API (metadata, summarization)
                  → YT module   → yt-dlp (external)

Routes: HTTP concern only — parse request, delegate to service, format response
Services: Business logic, validation, orchestration
Agent: LangChain ReAct agent — per-channel, with tools for video/transcript/search access
Memory: mem0 persistent memory — recalls past conversations, auto-extracts facts, deduplicates
AI: Raw DeepSeek API calls for metadata generation and summarization
Repositories: Data access — 1 file per DB table, raw SQL via Bun.sql
Vector: OpenAI embeddings + pgvector for semantic search
YT module: External service wrapper (yt-dlp subprocess)
Lib: Shared infrastructure (errors, response helpers)

Database

PostgreSQL with Bun.sql (tagged template queries)
Migrations: sequential .sql files in src/db/migrations/
Schema per file: 001_create_enums.sql, 002_create_channels.sql, etc.
Tables: channels, videos, transcripts, channel_metadata, chunk_embeddings, digests, content_gap_analyses
mem0 manages its own tables: agent_memories (pgvector collection)
Metadata is versioned — latest returned by default, old via separate API

Key Concepts

DB-First Strategy

Search PostgreSQL first
If not found → fetch from yt-dlp → save to DB → return
Metadata generated only after first transcript exists

Versioned Metadata

Each channel_metadata row has a version (auto-increment per channel)
Default API returns latest version
Separate endpoint for version history

mem0 Memory Layer

Each channel = agentId, each user = userId
Before agent response: recallMemories() retrieves relevant past context
After agent response: saveConversation() persists new facts (non-blocking)
mem0 auto-extracts facts, deduplicates, and manages memory lifecycle
Memories injected into system prompt as additional context

Smart Digest

Weekly AI-generated channel digest with view velocity scoring
Persona-aware narration per channel category
Trend comparison with previous digest
Optional user personalization via mem0
Agent tool get_latest_digest for conversational access

Content Gap Finder

K-means clustering of existing pgvector embeddings for topic coverage map
AI-powered topic extraction from clusters
Gap analysis against niche-specific reference topics
Ranked recommendations with confidence, priority, and suggested video titles
Agent tool find_content_gaps for conversational access

Immutability

NEVER mutate objects — always create new copies
Use Readonly<T> and ReadonlyArray<T> for type safety

Commands

bun run start               # Start server (runs migrations first)
bun run dev                 # Watch mode
bun run migrate             # Run migrations only
bun test                    # Run all tests
bun test --watch            # Watch mode

API Endpoints

GET  /api/channels/videos?channel=CalebWritesCode
GET  /api/channels/metadata?channel=CalebWritesCode
GET  /api/channels/metadata?channel=CalebWritesCode&version=2
GET  /api/channels/metadata/versions?channel=CalebWritesCode
GET  /api/transcripts?videoId=dQw4w9WgXcQ
POST /api/transcripts?videoId=dQw4w9WgXcQ
GET  /api/search?channel=CalebWritesCode&q=react hooks
POST /api/chat                    # AI agent chat
GET  /api/digests?channel=CalebWritesCode
POST /api/digests                 # Generate smart digest
GET  /api/digests/history?channel=CalebWritesCode
GET  /api/channels/content-gaps?channel=CalebWritesCode
POST /api/channels/content-gaps?channel=CalebWritesCode&force=true
GET  /api/channels/content-gaps/versions?channel=CalebWritesCode

POST /api/chat

{
  "channel": "CalebWritesCode",
  "userId": "user-123",
  "message": "Bu kanalda React haqida qanday videolar bor?",
  "history": []
}

GET /api/chat/memories

GET /api/chat/memories?channel=CalebWritesCode&userId=user-123

Bun-Specific Rules

Use Bun.sql for PostgreSQL (NOT pg, NOT postgres.js)
Use Bun.serve() for HTTP (NOT express)
Use Bun.file() for file I/O (NOT node:fs readFile/writeFile)
Use Bun.spawn() for subprocesses
Bun auto-loads .env — do NOT use dotenv

Environment Variables

DATABASE_URL=postgres://user:pass@localhost:5432/memory_bot
DEEPSEEK_API_KEY=            # Required for AI agent + metadata + summarization
OPENAI_API_KEY=              # Required for embeddings (text-embedding-3-small)
LOG_LEVEL=info               # debug | info | warn | error

Coding Standards

File size: 200-400 lines typical, 800 max
Function size: <50 lines
Nesting: max 4 levels deep
Naming: camelCase for variables/functions, PascalCase for types/classes
Exports: Named exports only (NO default exports)
Errors: Always handle explicitly, never swallow silently
Validation: Validate all external input at boundaries

Testing

Minimum 80% coverage
TDD workflow: RED → GREEN → REFACTOR
Test files: *.test.ts next to source files
Use bun:test (NOT jest, vitest)

Git

Commits in English: type(scope): description
Branch naming: feat/, fix/, refactor/
PR titles in English with full description

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Bot — YouTube Channel AI Agents

Project Overview

Tech Stack

Architecture

Layered Architecture

Database

Key Concepts

DB-First Strategy

Versioned Metadata

mem0 Memory Layer

Smart Digest

Content Gap Finder

Immutability

Commands

API Endpoints

POST /api/chat

GET /api/chat/memories

Bun-Specific Rules

Environment Variables

Coding Standards

Testing

Git

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Memory Bot — YouTube Channel AI Agents

Project Overview

Tech Stack

Architecture

Layered Architecture

Database

Key Concepts

DB-First Strategy

Versioned Metadata

mem0 Memory Layer

Smart Digest

Content Gap Finder

Immutability

Commands

API Endpoints

POST /api/chat

GET /api/chat/memories

Bun-Specific Rules

Environment Variables

Coding Standards

Testing

Git