Skip to content

Latest commit

 

History

History
239 lines (171 loc) · 6.75 KB

File metadata and controls

239 lines (171 loc) · 6.75 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Repository Structure

This workspace contains three projects:

  • opencode/ - Open-source AI coding agent (TypeScript/Bun monorepo, default branch: dev)
  • VibeVoice/ - Microsoft's open-source voice AI models (Python/PyTorch)
  • VoiceCodeAssistant/ - Voice-interactive coding tool that integrates OpenCode and VibeVoice

Quick Start (Recommended)

Use the launcher scripts which auto-install dependencies and start everything:

# Windows PowerShell
.\VoiceCodeAssistant\run.ps1

# Windows CMD
VoiceCodeAssistant\run.bat

# Linux/macOS
./VoiceCodeAssistant/run.sh

OpenCode

Note: bun is required (not npm) - OpenCode uses bun's catalog: protocol.

Build Commands

cd opencode

# Install dependencies
bun install

# Development (runs TUI in packages/opencode)
bun dev
bun dev <directory>      # Run against specific directory
bun dev .                # Run in repo root

# Headless API server
bun dev serve            # Default port 4096
bun dev serve --port 8080

# Web app development (requires server running first)
bun run --cwd packages/app dev

# Desktop app (requires Tauri dependencies)
bun run --cwd packages/desktop tauri dev

# Type checking
bun turbo typecheck      # From root
bun run typecheck        # From packages/opencode (uses tsgo)

# Tests (from packages/opencode)
bun test                           # All tests
bun test test/tool/tool.test.ts    # Single test file

# Build standalone executable
./packages/opencode/script/build.ts --single

# Regenerate SDK after API changes
./script/generate.ts

# Health check (verify server is running)
curl http://localhost:4096/global/health

Architecture

  • packages/opencode - Core business logic, server, and TUI (SolidJS + OpenTUI)
  • packages/app - Shared web UI components (SolidJS)
  • packages/desktop - Native desktop app (Tauri wrapper)
  • packages/plugin - Plugin system (@opencode-ai/plugin)
  • packages/sdk/js - JavaScript SDK (@opencode-ai/sdk)

The TUI communicates with the server using @opencode-ai/sdk. The server is in packages/opencode/src/server/server.ts.

Key source directories in packages/opencode/src/:

  • cli/cmd/tui/ - TUI code
  • tool/ - Tool implementations (implement Tool.Info interface)
  • provider/ - LLM provider integrations
  • session/ - Session management
  • mcp/ - Model Context Protocol integration
  • lsp/ - Language Server Protocol support

Code Style

  • No let - Use const with ternary or IIFE
  • No else - Use early returns
  • No try/catch - Prefer .catch()
  • No any - Use precise types
  • Single word names - Prefer concise identifiers
  • No unnecessary destructuring - Use obj.a instead of const { a } = obj
  • Use Bun APIs - e.g., Bun.file() instead of Node equivalents
  • Rely on type inference - Avoid explicit annotations unless necessary
  • No mocks in tests - Test actual implementations

Debugging

For UI changes, run backend and app servers separately:

# Backend (from packages/opencode)
bun run --conditions=browser ./src/index.ts serve --port 4096

# App (from packages/app)
bun dev -- --port 4444

Additional Notes

  • The default branch in opencode is dev, not main
  • To regenerate the JavaScript SDK: ./packages/sdk/js/script/build.ts
  • Use parallel tool calls when possible

VibeVoice

Setup

Use NVIDIA PyTorch Docker container (recommended):

sudo docker run --privileged --net=host --ipc=host --ulimit memlock=-1:-1 --ulimit stack=-1:-1 --gpus all --rm -it nvcr.io/nvidia/pytorch:25.12-py3

Install:

cd VibeVoice
pip install -e .

Running Models

ASR (Speech Recognition):

# Gradio demo
python demo/vibevoice_asr_gradio_demo.py --model_path microsoft/VibeVoice-ASR --share

# File inference
python demo/vibevoice_asr_inference_from_file.py --model_path microsoft/VibeVoice-ASR --audio_files <path>

Realtime TTS:

# WebSocket demo
python demo/vibevoice_realtime_demo.py --model_path microsoft/VibeVoice-Realtime-0.5B

# File inference
python demo/realtime_model_inference_from_file.py --model_path microsoft/VibeVoice-Realtime-0.5B --txt_path demo/text_examples/1p_vibevoice.txt --speaker_name Carter

Architecture

  • vibevoice/ - Core Python package (configs, modular components, processor, scheduler)
  • vllm_plugin/ - vLLM integration plugin
  • demo/ - Demo scripts and examples
  • finetuning-asr/ - LoRA fine-tuning for ASR

Models available:

  • VibeVoice-ASR-7B: 60-minute long-form audio, speaker diarization, timestamps
  • VibeVoice-TTS-1.5B: 90-minute multi-speaker synthesis
  • VibeVoice-Realtime-0.5B: Streaming TTS with ~200ms latency

Code Principles

  • Code minimalism and functional purity
  • No over-engineering or unnecessary abstraction
  • English only for all code, comments, and documentation

VoiceCodeAssistant

Voice-interactive coding assistant that bridges VibeVoice speech AI with OpenCode coding AI.

Setup

cd VoiceCodeAssistant
pip install -e .
pip install -e ../VibeVoice  # Required dependency

Running

# TUI mode (default) - auto-starts OpenCode server
python -m voice_code_assistant

# Manual mode (requires OpenCode server already running)
python -m voice_code_assistant --no-auto-start

# Language modes
python -m voice_code_assistant --lang ko-ko   # Korean → Korean
python -m voice_code_assistant --lang ko-en   # Korean → English
python -m voice_code_assistant --lang en-en   # English → English

# Simple console mode (no TUI)
python -m voice_code_assistant --no-tui

# CPU mode (no GPU)
python -m voice_code_assistant --device cpu

Architecture

The codebase is fully async (asyncio). Key components:

voice_code_assistant/
├── audio/          # Microphone (non-blocking), speaker (streaming), VAD
├── speech/         # ASR engine, TTS engine (both wrap VibeVoice)
├── coding/         # OpenCode HTTP/SSE client, session manager
├── core/           # Interrupt monitor/manager, voice feedback, task iteration
├── ui/             # Rich TUI (keyboard: Space=record, Tab=lang, F=feedback, Q=quit)
└── main.py         # VoiceCodeApp orchestrator

Voice Feedback Modes

Cycle with F key: OFF → ON → DETAILED

  • OFF: Silent operation
  • ON: Basic status ("파일 작성 중...", "테스트 실행 중...")
  • DETAILED: Context-aware explanations

Data Flow

[Microphone] → [VAD] → [VibeVoice-ASR] → Text → [OpenCode API] → Response → [VibeVoice-TTS] → [Speaker]
                                                      ↑                           ↓
                                              (SSE events)              (Interruptible)