StemForge is built on the shoulders of many outstanding open-source projects. We are grateful to every team listed below for making their work freely available.
Hybrid Transformer source separation powering the Separate tab (htdemucs, htdemucs_ft, mdx_extra, mdx_extra_q).
- Repository: https://github.com/facebookresearch/demucs
- Paper: Rouard, Massa & Défossez — Hybrid Transformers for Music Source Separation (ICASSP 2023)
- License: MIT
High-quality separation models used alongside Demucs in the Separate tab.
- Repository: https://github.com/lucidrains/BS-RoFormer
- Model repository: https://github.com/TRvlvr/model_repo
- Repository: https://github.com/ZFTurbo/Music-Source-Separation-Training
- Paper: Solovyev et al. — Benchmarks and leaderboards for sound demixing tasks (2023)
- Model repository: https://huggingface.co/jarredou/BS-ROFO-SW-Fixed
Polyphonic audio-to-MIDI transcription for instrument stems in the MIDI tab.
- Repository: https://github.com/spotify/basic-pitch
- Paper: Bittner et al. — A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation (ICASSP 2022)
- License: Apache 2.0
Speech recognition model used (via faster-whisper) for vocal pitch-to-MIDI extraction.
- Repository: https://github.com/openai/whisper
- Paper: Radford et al. — Robust Speech Recognition via Large-Scale Weak Supervision (2022)
- License: MIT
CTranslate2-accelerated Whisper inference powering the Vocal MIDI pipeline.
- Repository: https://github.com/SYSTRAN/faster-whisper
- License: MIT
Text-conditioned audio generation model powering the Synth tab.
- Repository: https://huggingface.co/stabilityai/stable-audio-open-1.0
- Paper: Evans et al. — Stable Audio Open (2024)
- License: Stability AI Community License
Full song generation from lyrics and style descriptions, powering the Compose tab.
- Repository: https://github.com/AceStudioAI/ACE-Step
- Paper: ACE-Step: A Step Towards Music Generation Foundation Model (2025)
- License: Apache 2.0
Retrieval-based Voice Conversion inference code (vendored) powering the Voice mode in the Compose tab. StemForge vendors Applio's inference-only subtree for audio-in → audio-out voice transformation.
- Applio repository: https://github.com/IAHispano/Applio
- RVC project: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
- License: MIT
Robust pitch estimation model used as the default F0 extraction method for voice conversion.
- Repository: https://github.com/Dream-High/RMVPE
- Paper: Wei et al. — RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music (2023)
Similarity search library used for speaker embedding retrieval in the RVC pipeline.
- Repository: https://github.com/facebookresearch/faiss
- License: MIT
Self-supervised speech representation model used as the speaker embedding extractor in RVC.
- Repository: https://github.com/auspicious3000/contentvec
- Paper: Qian et al. — ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers (ICML 2022)
Music analysis and notation toolkit powering MIDI cleanup, key detection, transposition, and sheet music export in the MIDI tab.
- Repository: https://github.com/cuthbertLab/music21
- Paper: Cuthbert & Ariza — music21: A Toolkit for Computer-Aided Musicology (2010)
- License: BSD 3-Clause
Browser-based MusicXML rendering (via VexFlow) for in-app sheet music preview.
- Repository: https://github.com/opensheetmusicdisplay/opensheetmusicdisplay
- License: MIT
Music engraving program used for PDF sheet music export via subprocess.
- Website: https://lilypond.org
- License: GPL 3.0 (external binary, not bundled)
Deep learning framework underlying all inference pipelines.
- Repository: https://github.com/pytorch/pytorch
- License: BSD-3-Clause
Diffusion pipeline framework used to load and run Stable Audio Open.
- Repository: https://github.com/huggingface/diffusers
- License: Apache 2.0
Tokenizer and model infrastructure used by the generation pipelines.
- Repository: https://github.com/huggingface/transformers
- License: Apache 2.0
Audio analysis and feature extraction used in the audio profiler and resampling utilities.
- Repository: https://github.com/librosa/librosa
- Paper: McFee et al. — librosa: Audio and Music Signal Analysis in Python (SciPy 2015)
- License: ISC
Software synthesizer used for MIDI preview rendering and Mix tab audio.
- Repository: https://github.com/FluidSynth/fluidsynth
- License: LGPL-2.1
Waveform visualization in the browser, used for all audio players and the global transport bar.
- Repository: https://github.com/katspaugh/wavesurfer.js
- License: BSD-3-Clause
Web framework powering the StemForge backend API.
- Repository: https://github.com/fastapi/fastapi
- License: MIT
ASGI server running the FastAPI application.
- Repository: https://github.com/encode/uvicorn
- License: BSD-3-Clause
Blazing-fast Python package manager and resolver used for deterministic environments.
- Repository: https://github.com/astral-sh/uv
- License: MIT / Apache 2.0
StemForge also relies on many other excellent open-source libraries including NumPy, SciPy, soundfile, mido, pretty_midi, einops, safetensors, accelerate, pydub, soxr, ai-edge-litert (TFLite runtime), torchcrepe, torchfcpe, noisereduce and stftpitchshift. Thank you to all their maintainers and contributors.
If you believe your project should be listed here and is not, please open an issue and we will add it.