Skip to content

zwh9029/VTuberSoulExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VTuberSoulExtractor — Virtual Streamer Personality Extractor

Automatically extract a virtual streamer's personality from Bilibili recordings. Outputs an OpenHanako-compatible ishiki.md personality file.

中文文档

How It Works

Bilibili Replay → yt-dlp Download → inaSpeechSegmenter Speech/Music Split
  → MLX-Whisper Transcribe → T2S Cleanup → Quality Filter
    → LLM Correction + Dialogue Reconstruction → Tone/Quirk Analysis
      → Generate ishiki.md Personality File

Who Is This For

Designed for VTuber fans. Feed it months of Bilibili replays:

Step Tool Output
Filter singing inaSpeechSegmenter Speech-only segments
Transcribe MLX-Whisper large-v3 187K dialogue lines
Clean up Rules + LLM ASR-corrected text
Analyze Statistical + LLM Tone, quirks, catchphrases
Generate LLM OpenHanako ishiki.md

Hardware

Minimum Recommended
CPU Apple M1 M1 Max+
RAM 16GB 64GB
Disk 20GB 50GB+

Quick Start

# Install dependencies
/usr/bin/pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple \
  numpy soundfile tqdm jieba zhconv librosa mlx-whisper \
  inaSpeechSegmenter yt-dlp openai imageio-ffmpeg modelscope

# Download MLX model (ModelScope, China-accessible)
python3 -c "
from modelscope import snapshot_download
snapshot_download('mlx-community/whisper-large-v3-mlx',
                   cache_dir='~/.cache/modelscope/hub')
"

# Fix inaSpeechSegmenter (see docs/PITFALLS.md)

# Download recordings
yt-dlp -x --audio-format m4a \
  -o "data/videos/BVid_p%(playlist_index)02d.%(ext)s" \
  "https://www.bilibili.com/video/BVid"

# Run pipeline
python3 src/pipeline.py separate      # Stage 2: Split speech/music
python3 src/pipeline.py transcribe    # Stage 3: Transcribe
python3 src/convert_t2s.py            # T→S conversion
python3 src/clean_transcripts.py      # Quality filter
python3 src/pipeline.py dialogue      # Stage 4: Analyze
python3 src/pipeline.py correct       # Stage 4.5: LLM correction
python3 src/auto_correct.py --api-key KEY --model deepseek-v4-flash
python3 src/pipeline.py personality   # Stage 5: Generate

Pipeline

Stage Command What It Does
1 yt-dlp Download Bilibili replay collections
2 separate inaSpeechSegmenter splits speech from music
3 transcribe MLX-Whisper large-v3 speech-to-text
convert_t2s Traditional → Simplified Chinese
clean Remove repetition, BGM artifacts
4 dialogue Tone word, punctuation, vocabulary stats
4.5 correct LLM batch correction + dialogue inference
5 personality Generate OpenHanako personality prompt

Real-World Results

Metric Value
Source 8 Bilibili collections (331h)
Speech 143.6h (43.4%)
Segments 187,632 transcribed lines
Corrections 452 (22.1% of sampled)
Quirks found 宁→您(10), 捏→呢(3), 不了一点(358)

Output Example

Generated ishiki.md includes:

  • Speaking tone, catchphrases, typical sentence patterns
  • Tone word frequency (吧/啊/呢/呀)
  • Interaction patterns (greeting, gift thanks, teasing)
  • Quirk mapping: {宁→您: 10, 捏→呢: 3}
  • Forbidden phrases and stylistic guidelines

Project Structure

VTuberSoulExtractor/
├── src/
│   ├── pipeline.py           # Main pipeline
│   ├── clean_transcripts.py  # Quality filter
│   ├── convert_t2s.py        # T→S conversion
│   └── auto_correct.py       # LLM batch correction
├── data/                     # Runtime (gitignored)
├── docs/
│   ├── PITFALLS.md
│   └── PITFALLS_CN.md
├── README.md
├── README_CN.md
├── requirements.txt
└── .gitignore

Credits

License

MIT

About

Automatically extract VTuber personality from Bilibili recordings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages