Skip to content

Latest commit

 

History

History
242 lines (140 loc) · 6.81 KB

File metadata and controls

242 lines (140 loc) · 6.81 KB

Acknowledgments

StemForge is built on the shoulders of many outstanding open-source projects. We are grateful to every team listed below for making their work freely available.


Demucs — Meta (Facebook AI Research)

Hybrid Transformer source separation powering the Separate tab (htdemucs, htdemucs_ft, mdx_extra, mdx_extra_q).


BS-Roformer / MelBand-Roformer — Community

High-quality separation models used alongside Demucs in the Separate tab.

bs-roformer Python package — Lucidrains

ViperX vocal model (SDR 12.97) — TRvlvr / UVR community

ZFTurbo 4-stem BS-Roformer & Music-Source-Separation-Training — Roman Solovyev (ZFTurbo)

jarredou 6-stem BS-Roformer (guitar + piano)


Basic Pitch — Spotify

Polyphonic audio-to-MIDI transcription for instrument stems in the MIDI tab.

  • Repository: https://github.com/spotify/basic-pitch
  • Paper: Bittner et al. — A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation (ICASSP 2022)
  • License: Apache 2.0

Whisper — OpenAI

Speech recognition model used (via faster-whisper) for vocal pitch-to-MIDI extraction.


faster-whisper — SYSTRAN

CTranslate2-accelerated Whisper inference powering the Vocal MIDI pipeline.


Stable Audio Open — Stability AI

Text-conditioned audio generation model powering the Synth tab.


ACE-Step — ACE Studio / Timedomain

Full song generation from lyrics and style descriptions, powering the Compose tab.


Applio / RVC — IAHispano & RVC-Project

Retrieval-based Voice Conversion inference code (vendored) powering the Voice mode in the Compose tab. StemForge vendors Applio's inference-only subtree for audio-in → audio-out voice transformation.

RMVPE — lj1995

Robust pitch estimation model used as the default F0 extraction method for voice conversion.

FAISS — Meta (Facebook AI Research)

Similarity search library used for speaker embedding retrieval in the RVC pipeline.

ContentVec — auspicious3000

Self-supervised speech representation model used as the speaker embedding extractor in RVC.


music21 — MIT / Michael Scott Asato Cuthbert

Music analysis and notation toolkit powering MIDI cleanup, key detection, transposition, and sheet music export in the MIDI tab.


OpenSheetMusicDisplay (OSMD)

Browser-based MusicXML rendering (via VexFlow) for in-app sheet music preview.


LilyPond (optional)

Music engraving program used for PDF sheet music export via subprocess.


PyTorch — Meta (Facebook AI Research)

Deep learning framework underlying all inference pipelines.


Hugging Face Diffusers

Diffusion pipeline framework used to load and run Stable Audio Open.


Hugging Face Transformers

Tokenizer and model infrastructure used by the generation pipelines.


librosa

Audio analysis and feature extraction used in the audio profiler and resampling utilities.


FluidSynth

Software synthesizer used for MIDI preview rendering and Mix tab audio.


wavesurfer.js — katspaugh

Waveform visualization in the browser, used for all audio players and the global transport bar.


FastAPI — Sebastián Ramírez (tiangolo)

Web framework powering the StemForge backend API.


Uvicorn — Encode

ASGI server running the FastAPI application.


uv — Astral

Blazing-fast Python package manager and resolver used for deterministic environments.


Additional dependencies

StemForge also relies on many other excellent open-source libraries including NumPy, SciPy, soundfile, mido, pretty_midi, einops, safetensors, accelerate, pydub, soxr, ai-edge-litert (TFLite runtime), torchcrepe, torchfcpe, noisereduce and stftpitchshift. Thank you to all their maintainers and contributors.


If you believe your project should be listed here and is not, please open an issue and we will add it.