Skip to content

Harmonai-org/oobleck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Oobleck

"The more you compress it, the harder it gets"

Possible backronyms:

  • "Out-of-the-box Latent Encoder Construction Kit"
  • "Over-optimized Latent Encoder Construction Kit"

(Open to other names, we can vote at the end.)

What is it?

MIT-licensed soundstream-ish VAE audio codecs for downstream neural audio synthesis.

We will be creating at least

  • a continuous VAE (for downstream audio diffusion, etc)
  • a vector quantized VAE (for downstream MusicLM, etc)
  • a spherical VAE (for quantum circuits, etc)

We will validate them using

  • MUSHRA testing protocol with expert listeners for a subjective performance measure
  • visqol and si-snr expert listeners for an objective performance measure
  • FAD?
  • ...
  • (waiting for the first models to be trained)

We will be experimenting with different loss functions

  • perceptual loss, etc
  • place loss functions into auraloss and import it

Installation

OOBLECK_VERSION=develop pip install git+https://github.com/Harmonai-org/oobleck.git

Usage

Instantiation of an OOBLECK autoencoder corresponding to a given .gin file can be done following

import gin
import torch

from oobleck import AudioAutoEncoder

gin.parse_config_file("base/base.gin")
model = AudioAutoEncoder()

inputs = {"waveform": torch.randn(1, 1, 2**16)}
outputs = model.loss(inputs)

print(outputs.keys())

# >>> dict_keys(['waveform', 'latent', 'reconstruction', \
# >>> 'score_waveform', 'score_reconstruction', 'features_reconstruction', \
# >>> 'features_waveform', 'generator_loss', 'MultiResolutionSTFTLoss', \
# >>> 'discriminator_loss'])

About

open soundstream-ish VAE codecs for downstream neural audio synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors