The ssrc tool is built on top of the libshibatchdsp C++ library. You can use this library directly in your own projects to perform sample rate conversion without shelling out to an external command. The library is header-only and uses templates to support both single-precision (float) and double-precision (double) processing.
The library is designed around a pipeline concept. Audio data flows through a series of processing stages. The core abstraction is ssrc::StageOutlet<T>, an abstract class that represents a source of audio data that can be read from.
The typical pipeline looks like this:
WavReader -> SSRC -> Dither -> WavWriter
WavReader: Reads a.wavfile and acts as the startingStageOutlet.SSRC: Takes aStageOutletas input (the reader) and performs sample rate conversion. It is also aStageOutletitself.Dither: Takes theSSRC's output and applies dithering. It is also aStageOutlet.WavWriter: Takes one or more finalStageOutlets and writes the data to a.wavfile.
A key detail of the implementation, as seen in src/cli/cli.cpp, is that the pipeline branches after the WavReader to process each audio channel in parallel. A separate SSRC (and Dither, if used) instance is created for each channel. The WavWriter is then responsible for merging these parallel streams back into a single, interleaved audio file.
For a stereo (2-channel) file, the topology looks like this:
+------------------------+
+-->| SSRC<T> for Channel 0 |--+
| +------------------------+ |
| |
+-----------+ +----------+ +------------+
| WavReader |-->--| (Fork) | | WavWriter |--> output.wav
+-----------+ +----------+ +------------+
| |
| +------------------------+ |
+-->| SSRC<T> for Channel 1 |--+
+------------------------+
- Forking: The
WavReaderprovides a separateStageOutletfor each channel (reader->getOutlet(0),reader->getOutlet(1), etc.). This is where the pipeline splits. - Parallel Processing: Each channel is processed independently. This design is clean and allows for channel-specific processing if needed.
- Merging: The
WavWritertakes astd::vectorofStageOutlets in its constructor. It reads one sample from each outlet in turn to reconstruct the interleaved audio data needed for the final WAV file.
The data type of the samples flowing through the pipeline also follows a specific path:
- Floating-Point Processing: The initial processing stages (
WavReader,SSRC) are templated and typically operate onfloatordouble. This maintains high precision during the most critical calculations (resampling). - Conversion to Integer: If dithering is used, the
Ditherstage is responsible for converting the high-precision floating-point signal into an integer signal. - Unified Integer Type (
int32_t): Crucially, theDitherstage always outputsint32_t, regardless of the final target bit depth (e.g., 16-bit or 24-bit). This simplifies the design, as any downstream stages (like theWavWriter) only need to handle a single, universal integer type. TheWavWriteris then responsible for taking theint32_tdata and correctly writing it to the file with the specified final bit depth.
All necessary classes are available by including one header:
#include "shibatch/ssrc.hpp"Reads audio data from a WAV file. T can be float or double.
// Constructor for reading from a file
ssrc::WavReader<float> reader("input.wav");
// Get format information
ssrc::WavFormat format = reader.getFormat();
int channels = format.channels;
int sampleRate = format.sampleRate;
// Get an outlet for a specific channel
std::shared_ptr<ssrc::StageOutlet<float>> channel_outlet = reader.getOutlet(0); // For channel 0The main sample rate converter.
// Create a resampler
// - inlet: The source outlet (e.g., from WavReader)
// - sfs: Source frequency (e.g., 44100)
// - dfs: Destination frequency (e.g., 96000)
// - log2dftfilterlen, aa, guard: Profile parameters (see cli.cpp for examples)
auto resampler = std::make_shared<ssrc::SSRC<float>>(
reader.getOutlet(0),
44100,
96000,
14, // log2dftfilterlen (from "standard" profile)
145, // aa (stop-band attenuation)
2.0 // guard factor
);Writes audio data from one or more outlets to a WAV file.
Constructor
WavWriter(filename, format, container, inlets, nFrames, bufsize, mt)
const std::string& filename: The path to the output WAV file. If the string is empty, it will write to standard output.const WavFormat& format: AWavFormatstruct defining the output audio format (channels, sample rate, bit depth, etc.).const ContainerFormat& container: AContainerFormatstruct defining the file container type (e.g.,RIFF,W64).const std::vector<std::shared_ptr<StageOutlet<T>>>& inlets: A vector ofStageOutletpointers, one for each channel to be written. The writer merges these streams into an interleaved output file.uint64_t nFrames: (Optional) The total number of frames to be written. This is primarily used when writing to a non-seekable destination like standard output, where the file header must be written upfront with the final length. Defaults to0.size_t bufsize: (Optional) The size of the internal buffer used for writing data to disk. Defaults to65536.bool mt: (Optional) A boolean flag to enable or disable multithreaded file writing. Defaults totrue. Whenfalse, all file I/O is performed in a single thread. This can be useful for debugging or in environments with specific threading constraints.
execute() Method
This method starts the pipeline. It pulls data from the inlets, processes it, and writes it to the destination file. The function blocks until all data from the input stages has been written.
Example
// Create a vector of outlets (one for each channel)
std::vector<std::shared_ptr<ssrc::StageOutlet<float>>> outlets;
outlets.push_back(resampler); // Add the resampler for channel 0
// ... add resamplers for other channels ...
// Define the output format
ssrc::WavFormat dstFormat(ssrc::WavFormat::PCM, channels, 96000, 24); // 96kHz, 24-bit PCM
ssrc::ContainerFormat dstContainer(ssrc::ContainerFormat::RIFF);
// Create the writer (with multithreading enabled by default)
ssrc::WavWriter<float> writer("output.wav", dstFormat, dstContainer, outlets);
// Execute the entire pipeline (read -> process -> write)
writer.execute();Here is a complete example that ties everything together. It reads a WAV file, resamples it using single-precision floats, and saves the result as a 24-bit PCM WAV file.
#include <iostream>
#include <vector>
#include <memory>
#include <cstdlib>
#include "shibatch/ssrc.hpp"
void convert_file(const std::string& in_path, const std::string& out_path, int dstRate) {
try {
// 1. Set up the reader for single-precision floats
auto reader = std::make_shared<ssrc::WavReader<float>>(in_path);
ssrc::WavFormat srcFormat = reader->getFormat();
// 2. Define destination format and conversion parameters
int dstBits = 24;
ssrc::WavFormat dstFormat(ssrc::WavFormat::PCM, srcFormat.channels, dstRate, dstBits);
ssrc::ContainerFormat dstContainer(ssrc::ContainerFormat::RIFF);
// 3. Create a resampler for each channel
std::vector<std::shared_ptr<ssrc::StageOutlet<float>>> outlets;
for (int i = 0; i < srcFormat.channels; ++i) {
auto resampler = std::make_shared<ssrc::SSRC<float>>(
reader->getOutlet(i),
srcFormat.sampleRate,
dstRate,
14, // log2dftfilterlen for "standard" profile
145, // aa for "standard" profile
2.0 // guard for "standard" profile
);
outlets.push_back(resampler);
}
// 4. Set up the writer
auto writer = std::make_shared<ssrc::WavWriter<float>>(out_path, dstFormat, dstContainer, outlets);
// 5. Execute the entire process
std::cout << "Converting " << in_path << " to " << out_path << "..." << std::endl;
writer->execute();
std::cout << "Conversion complete." << std::endl;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
}
}
int main(int argc, char **argv) {
if (argc == 4) {
convert_file(argv[1], argv[2], atoi(argv[3]));
return 0;
}
std::cerr << "Usage : " << argv[0] << " <input.wav> <output.wav> <new_rate>" << std::endl;
return -1;
}This section delves deeper into specific components of the libshibatchdsp API.
Converting high-resolution audio to a lower bit depth (e.g., 24-bit to 16-bit) involves quantization, where sample values are rounded to the nearest available level. This process creates quantization errors that manifest as distortion correlated with the original signal, which is musically unpleasant, especially on fading reverb tails.
Dithering is a technique that mitigates this by adding a small amount of uncorrelated noise to the signal prior to quantization. This crucial step trades the harsh, signal-correlated distortion for a more benign and constant noise floor.
To take this a step further, noise shaping can be employed. This process intelligently sculpts the noise floor, pushing the noise energy away from the frequency ranges where the human ear is most sensitive (e.g., 2-5 kHz) and into the far less audible, very high frequencies. While this may physically increase the total noise energy in the system, it results in a significantly lower perceived noise level. This combined process preserves low-level detail and the sense of resolution in the final audio.
The ssrc::Dither class is a pipeline stage that performs this function. It takes a high-resolution signal as input (e.g., from the SSRC stage) and outputs a quantized signal.
Pipeline with Dither:
WavReader -> SSRC -> Dither -> WavWriter
Usage:
The Dither class is templated on its output and input types: Dither<OUTTYPE, INTYPE>. Typically, INTYPE is float or double (from the resampler) and OUTTYPE is int32_t (a standard integer type for PCM data).
#include "shibatch/shapercoefs.h" // Required for noise shaper coefficients
// ... inside your conversion function ...
// Assume 'resampler' is a std::shared_ptr<ssrc::SSRC<float>> from the previous stage.
auto gain = (1LL << (16 - 1)) - 1; // For 16-bit output
auto clipMin = -(1LL << (16 - 1));
auto clipMax = (1LL << (16 - 1)) - 1;
// Find the appropriate noise shaper coefficients for the destination sample rate.
const ssrc::NoiseShaperCoef* shaper = nullptr;
for (int i = 0; ssrc::noiseShaperCoef[i].fs >= 0; ++i) {
if (ssrc::noiseShaperCoef[i].fs == dstRate && ssrc::noiseShaperCoef[i].id == 0) {
shaper = &ssrc::noiseShaperCoef[i];
break;
}
}
if (!shaper) {
// Handle case where no suitable shaper is found for the target rate
throw std::runtime_error("No suitable noise shaper found for the destination sample rate.");
}
// Create the dither stage
auto dither_stage = std::make_shared<ssrc::Dither<int32_t, float>>(
resampler, // Input outlet
gain, // Gain to apply before quantization
0, // DC offset (0 for standard PCM)
clipMin, // Minimum clipping value
clipMax, // Maximum clipping value
shaper // The noise shaper coefficients
);
// Now, pass `dither_stage` to the WavWriter instead of `resampler`.
// The WavWriter's template type should also be int32_t.
// std::vector<std::shared_ptr<ssrc::StageOutlet<int32_t>>> outlets;
// outlets.push_back(dither_stage);
// auto writer = std::make_shared<ssrc::WavWriter<int32_t>>(...);
// writer->execute();Most key classes in the library are templated with a typename T or typename REAL, such as SSRC<REAL>, WavReader<T>, and WavWriter<T>. This template parameter controls the floating-point precision used for internal calculations. You can instantiate these classes with either float (single-precision) or double (double-precision).
-
float(Single Precision)- Pros: Faster execution, lower memory usage.
- Cons: Less precision.
- Use Case: For most standard audio applications,
floatis perfectly sufficient. The "standard" profile and below usefloatby default. The precision of a 32-bit float is already far greater than that of 24-bit audio, so it does not typically become a bottleneck for quality.
-
double(Double Precision)- Pros: Extremely high precision, theoretically higher audio quality.
- Cons: Slower execution (can be 1.5x to 2x slower), higher memory usage.
- Use Case: For archival purposes or when working in a signal chain that requires the absolute highest fidelity (e.g., for scientific analysis or a "cost-no-object" audiophile setup). The "high" and "insane" profiles use
double.
Example: To use double-precision, simply change the template parameter throughout your pipeline:
// Double-precision pipeline
auto reader = std::make_shared<ssrc::WavReader<double>>(in_path);
// ...
auto resampler = std::make_shared<ssrc::SSRC<double>>(
reader->getOutlet(i),
// ... SSRC parameters ...
);
// ...
// Note: Dither still takes a float/double and outputs int32_t
auto dither_stage = std::make_shared<ssrc::Dither<int32_t, double>>(resampler, ...);These two structs work together to describe the audio data's format and the file structure that contains it.
This struct specifies the overall file type by defining its main ChunkID. This ID tells parsers what kind of file they are dealing with. The choice of container is passed to the underlying dr_wav library.
ContainerFormat::RIFF: TheChunkIDis'RIFF'. This is the classic WAV format, but it is limited to a maximum file size of 4 GB.ContainerFormat::W64: Sony Wave64 format. This is one of several competing formats designed to exceed the 4GB limit using 64-bit addressing.ContainerFormat::RF64: An extension of RIFF that is also 64-bit compatible. It is designed to be backwards-compatible with systems that don't recognize it.
Choosing a 64-bit compatible container like RF64 or W64 is essential if your output file might be larger than 4 GB.
This struct's contents correspond directly to the data stored in a WAV file's fmt chunk. It describes the specific properties of the raw audio data itself.
formatTag: The audio codec. Key values are:WavFormat::PCM: Standard Pulse Code Modulation.WavFormat::IEEE_FLOAT: 32-bit or 64-bit floating-point samples.WavFormat::EXTENSIBLE: A newer format tag used for audio that doesn't fit the classicPCMspecification, such as multi-channel audio (more than 2 channels).
channels: Number of audio channels.sampleRate: The sample rate in Hz (e.g., 44100).bitsPerSample: The bit depth (e.g., 16, 24, 32).channelMask: A bitmask specifying the speaker layout for multi-channel audio (e.g.,0x3Ffor 5.1 surround,0x63Ffor 7.1 surround). Only used whenformatTagisEXTENSIBLE.subFormat: A GUID specifying the sub-format, also used withEXTENSIBLE. The library provides constants for PCM and IEEE Float (KSDATAFORMAT_SUBTYPE_PCMandKSDATAFORMAT_SUBTYPE_IEEE_FLOAT).
Example: Creating a format for a 5.1 Surround, 24-bit, 48kHz WAV file
// This requires the extensible format.
uint32_t channelMask_5_1 = 0x3F; // Front L/R, Center, LFE, Back L/R
ssrc::WavFormat format_5_1(
ssrc::WavFormat::EXTENSIBLE,
6, // channels
48000, // sampleRate
24, // bitsPerSample
channelMask_5_1,
ssrc::WavFormat::KSDATAFORMAT_SUBTYPE_PCM
);
// Use a container that supports large files, like W64 or RF64
ssrc::ContainerFormat container(ssrc::ContainerFormat::RF64);
// auto writer = std::make_shared<ssrc::WavWriter<float>>(..., format_5_1, container, ...);The dithering process relies on a stream of random numbers to generate the dither noise. The library provides a default triangular PDF random number generator. However, you can provide your own by implementing the ssrc::DoubleRNG abstract base class.
This allows you to experiment with different types of noise (e.g., Gaussian, or different distributions) for dithering.
Interface:
The DoubleRNG interface is very simple:
class DoubleRNG {
public:
virtual double nextDouble() = 0; // Must return a random double
virtual ~DoubleRNG() = default;
};Example: Implementing a Simple Uniform RNG
Here is how you could implement a simple RNG that produces a uniform distribution between -1.0 and 1.0 and use it in the dither stage.
#include <random>
// 1. Implement the DoubleRNG interface
class MyUniformRNG : public ssrc::DoubleRNG {
private:
std::mt19937 engine_;
std::uniform_real_distribution<double> dist_;
public:
MyUniformRNG(uint64_t seed = 0) : engine_(seed), dist_(-1.0, 1.0) {}
double nextDouble() override {
return dist_(engine_);
}
};
// ... inside your conversion function ...
// 2. Create an instance of your custom RNG
auto my_rng = std::make_shared<MyUniformRNG>(/* seed */);
// 3. Pass it to the Dither constructor
auto dither_stage = std::make_shared<ssrc::Dither<int32_t, float>>(
resampler,
// ... other dither parameters ...
shaper,
my_rng // Your custom RNG
);The SSRC constructor takes three parameters that define the conversion profile, controlling the trade-off between quality and speed.
SSRC(inlet, sfs, dfs, log2dftfilterlen, aa, guard, gain, minPhase, l2mindftflen, mt)
-
unsigned log2dftfilterlen- Description: The base-2 logarithm of the FFT filter length. The actual filter length is
1 << log2dftfilterlen. - Impact: This is the most significant parameter for quality. A longer filter (higher
log2dftfilterlen) allows for a steeper, more precise anti-aliasing filter, which better removes unwanted frequencies. However, it dramatically increases computational cost. - Values: Typical values range from
8("lightning") to18("insane").
- Description: The base-2 logarithm of the FFT filter length. The actual filter length is
-
double aa(Stop-band Attenuation)- Description: The required attenuation in the stop-band, measured in decibels (dB). The stop-band is the range of frequencies that should be completely eliminated by the filter.
- Impact: A higher value (e.g., 170 dB) results in a "blacker" background with less aliasing noise, at the cost of increased computational complexity.
- Values: Typical values range from
96dB to200dB.
-
double guard(Guard Band)- Description: A factor that determines the width of the guard band between the pass-band (frequencies to keep) and the stop-band (frequencies to remove).
- Impact: A larger guard band makes the filter's job easier, allowing for faster computation, but it does so by slightly narrowing the range of frequencies that are passed through. This is mostly relevant for conversions between very close sample rates (like 44.1kHz and 48kHz).
- Values: Typical values range from
1.0to8.0.
-
double gain- Description: A linear gain multiplier applied to the signal. Defaults to
1.0(no change). - Values: Any double-precision floating-point value. For example,
0.5would reduce the signal level by 6 dB.
- Description: A linear gain multiplier applied to the signal. Defaults to
-
bool minPhase- Description: A boolean flag to select the filter type. Defaults to
false. - Impact: When
false(default), the converter uses linear-phase filters, which preserve the waveform's shape but introduce a processing delay equal to half the filter length. Whentrue, it uses minimum-phase filters, which significantly reduce this delay, making the process suitable for real-time applications. The trade-off is a change in phase response, which is generally inaudible. - Values:
trueorfalse.
- Description: A boolean flag to select the filter type. Defaults to
-
unsigned l2mindftflen(Log2 Minimum DFT Filter Length)- Description: When using partitioned convolution for low-latency processing, this parameter sets the base-2 logarithm of the minimum FFT size for the filter partitions. Defaults to
0(disabled). - Impact: This parameter is key for tuning real-time performance. A non-zero value enables the partitioned convolution algorithm, which breaks the main filter into smaller chunks to reduce latency. A smaller
l2mindftflenleads to lower latency but higher CPU usage. This is often used in conjunction withminPhase. - Values: An unsigned integer, typically between
8and12for real-time applications.
- Description: When using partitioned convolution for low-latency processing, this parameter sets the base-2 logarithm of the minimum FFT size for the filter partitions. Defaults to
-
bool mt(Multithreading)- Description: A boolean flag to enable or disable multithreaded processing. Defaults to
true. - Impact: When
true, the resampler may use multiple threads to accelerate the computation, particularly the FFT. Setting this tofalseforces the resampler to operate in a single-threaded mode. This is useful for debugging, ensuring determinism, or in environments where thread management is handled externally. - Values:
true(default) orfalse.
- Description: A boolean flag to enable or disable multithreaded processing. Defaults to
These parameters are bundled together in the command-line tool's "profiles". When using the library directly, you can mix and match these values to create a custom profile tailored to your specific needs.
The entire libshibatchdsp library is built on a simple but powerful design pattern: the processing pipeline. Audio data flows from a source, through one or more processing stages, to a destination. Each of these stages is connected by a unified interface: ssrc::StageOutlet<T>.
This interface is the fundamental building block of the library. WavReader is a StageOutlet, SSRC is a StageOutlet, and Dither is a StageOutlet. By making your own class that implements this interface, you can create custom audio effects, generators, or other processing tools and seamlessly insert them anywhere in the pipeline.
To create a custom stage, you must inherit from ssrc::StageOutlet<T> and implement its two pure virtual functions:
-
virtual bool atEnd()- Purpose: This function should return
trueif the stream has no more data to provide, andfalseotherwise. - Implementation: If your stage is processing data from an input stage, you should typically call
atEnd()on your input and return its value. If you are generating data, you returntruewhen your generation logic is complete.
- Purpose: This function should return
-
virtual size_t read(T *ptr, size_t n)- Purpose: This is the core function where data is processed and provided. It should fill the provided buffer
ptrwith up tonsamples of audio data. - Return Value: It must return the number of samples that were actually written to
ptr. A return value of0signals that the stream has ended (i.e.,atEnd()is nowtrue). - Blocking: If no data is currently available but the stream is not at the end, this function should block until data becomes available.
- Purpose: This is the core function where data is processed and provided. It should fill the provided buffer
Example: Creating a Custom Gain (Volume) Stage
Here is a complete example of a simple gain stage. It reads data from an input outlet, multiplies each sample by a constant factor, and can be inserted anywhere in the pipeline.
#include <memory>
#include "shibatch/ssrc.hpp"
template <typename T>
class GainStage : public ssrc::StageOutlet<T> {
private:
std::shared_ptr<ssrc::StageOutlet<T>> inlet_; // The previous stage in the pipeline
double gain_factor_;
public:
// Constructor takes the input stage and the gain factor (e.g., 1.0 is no change)
GainStage(std::shared_ptr<ssrc::StageOutlet<T>> inlet, double gain_factor)
: inlet_(inlet), gain_factor_(gain_factor) {}
// atEnd() is true if the input stage is at its end.
bool atEnd() override {
return inlet_->atEnd();
}
// read() fetches data from the input, applies gain, and returns it.
size_t read(T* ptr, size_t n) override {
// Read data from the input stage into our buffer.
size_t samples_read = inlet_->read(ptr, n);
// Apply the gain to each sample that was read.
for (size_t i = 0; i < samples_read; ++i) {
ptr[i] *= gain_factor_;
}
return samples_read;
}
};
// --- How to use it in a pipeline ---
//
// WavReader -> SSRC -> GainStage -> WavWriter
//
// ... after creating the SSRC resampler ...
// auto resampler = std::make_shared<ssrc::SSRC<float>>(...);
// Create an instance of your gain stage, wrapping the resampler.
// This example reduces the volume by half (gain factor 0.5).
auto gain_stage = std::make_shared<GainStage<float>>(resampler, 0.5);
// Pass the gain stage to the writer instead of the resampler.
// std::vector<std::shared_ptr<ssrc::StageOutlet<float>>> outlets;
// outlets.push_back(gain_stage);
// auto writer = std::make_shared<ssrc::WavWriter<float>>(..., outlets);
// writer->execute();While the standard pipeline processes each channel independently, there are many cases where you need to combine, re-route, or change the number of channels. This is the role of the shibatch::ChannelMixerStage<T> stage. It takes any number of input channels and produces any number of output channels, with the transformation being defined by a mixing matrix.
Common use cases include:
- Downmixing: Converting a multi-channel source (e.g., 5.1 surround, stereo) to a format with fewer channels (e.g., stereo, mono).
- Upmixing: Creating a multi-channel output from a source with fewer channels (e.g., converting a stereo track to a "pseudo-5.1" track).
- Channel Re-routing: Swapping the left and right channels, or re-ordering channels in a multi-channel file.
- Applying Gain: Applying a specific gain to each channel independently.
The ChannelMixer is a processing stage that is typically inserted early in the pipeline, often right after the WavReader, before resampling occurs. This is usually more efficient, as it means the resampling stage only has to process the final number of channels.
Pipeline with ChannelMixer (Stereo to Mono Downmix):
+-------------------------+
+-->| ChannelMixer<T> (Mono) |--> SSRC<T> --> ...
| +-------------------------+
|
+-----------+ +----------+
| WavReader |-->--| (Fork) |
+-----------+ +----------+
|
| (Input Channel 1 is also
+--> fed into the mixer, but
is not an independent
output from this stage)
The core of the ChannelMixer is the matrix, which you provide to its constructor as a std::vector<std::vector<double>>. This matrix defines exactly how the input channels are combined to create the output channels.
- The number of rows in the matrix determines the number of output channels.
- The number of columns in each row must equal the number of input channels.
The value at matrix[out_channel][in_channel] is the gain (multiplier) applied to the input channel's signal before it is summed into the output channel's signal.
Formula:
output[out_ch] = sum(input[in_ch] * matrix[out_ch][in_ch] for all in_ch)
Example 1: Stereo to Mono Downmix
To convert a 2-channel stereo input to a 1-channel mono output, you need a matrix with 1 row and 2 columns. A standard downmix formula is Mono = 0.5 * Left + 0.5 * Right.
The corresponding matrix would be:
std::vector<std::vector<double>> matrix = {
{0.5, 0.5} // Output Channel 0 = 0.5 * Input 0 + 0.5 * Input 1
};Example 2: Swapping Stereo Channels
To swap the left and right channels of a 2-channel input, you need a 2x2 matrix.
- Output 0 (new Left) should be 1.0 * Input 1 (old Right).
- Output 1 (new Right) should be 1.0 * Input 0 (old Left).
The matrix would be:
std::vector<std::vector<double>> matrix = {
{0.0, 1.0}, // Output 0 = 0.0 * Input 0 + 1.0 * Input 1
{1.0, 0.0} // Output 1 = 1.0 * Input 0 + 0.0 * Input 1
};Here is a full example that demonstrates how to read a stereo WAV file, downmix it to mono using the ChannelMixer, resample the mono signal, and write the result to a new WAV file.
#include <iostream>
#include <vector>
#include <memory>
#include <stdexcept>
#include "shibatch/ssrc.hpp"
void stereo_to_mono_conversion(const std::string& in_path, const std::string& out_path, int dstRate) {
try {
// 1. Set up the reader
auto reader = std::make_shared<ssrc::WavReader<float>>(in_path);
ssrc::WavFormat srcFormat = reader->getFormat();
if (srcFormat.channels != 2) {
throw std::runtime_error("Input file must be stereo.");
}
// 2. Define the mixing matrix for stereo-to-mono
std::vector<std::vector<double>> mix_matrix = {
{0.5, 0.5} // Mono = 0.5 * Left + 0.5 * Right
};
// 3. Create the ChannelMixer stage
// The mixer takes the WavReader as its input.
auto mixer = std::make_shared<shibatch::ChannelMixerStage<float>>(reader, mix_matrix);
// 4. Define destination format
// The number of channels for the output is now determined by the mixer.
ssrc::WavFormat dstFormat(ssrc::WavFormat::PCM, mixer->getFormat().channels, dstRate, 24);
ssrc::ContainerFormat dstContainer(ssrc::ContainerFormat::RIFF);
// 5. Create a resampler for each output channel of the mixer
// In this case, there is only one channel (mono).
std::vector<std::shared_ptr<ssrc::StageOutlet<float>>> resampler_outlets;
for (uint32_t i = 0; i < mixer->getFormat().channels; ++i) {
auto resampler = std::make_shared<ssrc::SSRC<float>>(
mixer->getOutlet(i), // Input is now the mixer's outlet
srcFormat.sampleRate,
dstRate,
14, // "standard" profile
145, // "standard" profile
2.0 // "standard" profile
);
resampler_outlets.push_back(resampler);
}
// 6. Set up the writer
auto writer = std::make_shared<ssrc::WavWriter<float>>(out_path, dstFormat, dstContainer, resampler_outlets);
// 7. Execute the entire process
std::cout << "Converting " << in_path << " (stereo) to " << out_path << " (mono)..." << std::endl;
writer->execute();
std::cout << "Conversion complete." << std::endl;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
}
}
int main(int argc, char **argv) {
if (argc == 4) {
stereo_to_mono_conversion(argv[1], argv[2], atoi(argv[3]));
return 0;
}
std::cerr << "Usage : " << argv[0] << " <input_stereo.wav> <output_mono.wav> <new_rate>" << std::endl;
return -1;
}In addition to the C++ template library, ssrc provides a C-language API with a calling convention somewhat similar to the popular libsoxr. This API is easier to integrate into non-C++ projects and provides a more straightforward, stateful interface for resampling.
To use this API, include the header:
#include "shibatch/ssrcsoxr.h"This C API is designed to be a somewhat drop-in replacement for libsoxr. By defining the SSRC_LIBSOXR_EMULATION macro before including the header, all ssrc_soxr_* functions and types are aliased to their soxr_* equivalents (e.g., ssrc_soxr_create becomes soxr_create). This allows for relatively easy migration of existing codebases that already use libsoxr.
#define SSRC_LIBSOXR_EMULATION
#include "shibatch/ssrcsoxr.h"The examples in this documentation will use the soxr_* names, assuming this macro is defined.
The API is stateful. The typical workflow is:
- Create a resampler object (
soxr_t) withsoxr_create(). - Process audio data in chunks by repeatedly calling
soxr_process(). - Flush the resampler by calling
soxr_process()withNULLinput to retrieve any remaining buffered samples. - Delete the resampler object with
soxr_delete()to free resources.
Creates and initializes a resampler instance.
input_rate,output_rate: The source and destination sample rates.num_channels: The number of audio channels to process.error: A pointer to asoxr_error_tthat will be set if creation fails.iospec: Asoxr_io_spec_tstruct specifying data formats.qspec: Asoxr_quality_spec_tstruct specifying the conversion quality profile.rtspec: Reserved for future use; should beNULL.
Returns a soxr_t handle on success or NULL on failure.
Processes a chunk of audio data.
soxr: The resampler handle.in,ilen: Pointer to the input buffer and the number of frames it contains. To flush the internal buffers at the end of the stream, setintoNULLandilento0.idone: (Optional) A pointer to asize_tthat will be set to the number of frames consumed from the input buffer.out,olen: Pointer to the output buffer and its capacity in frames.odone: A pointer to asize_tthat will be set to the number of frames written to the output buffer.
Returns an error code if an error occurs during processing.
Resets the resampler to its initial state, clearing all internal buffers. This is useful for processing a new signal with the same configuration without the overhead of destroying and recreating the resampler.
Frees all memory and resources associated with the resampler handle.
Returns the processing delay of the resampler in samples. This represents the number of zero samples that should be discarded from the beginning of the output to maintain signal synchronization.
For convenience, the API provides a "one-shot" function to resample a signal that is already entirely in memory. This function handles the creation, processing, flushing, and deletion of a resampler in a single call.
soxr_error_t soxr_oneshot(in_rate, out_rate, num_ch, in, ilen, *idone, out, olen, *odone, *io_spec, *q_spec, *rt_spec)
Resamples a block of in-memory audio data in a single function call.
in_rate,out_rate,num_ch: The sample rates and channel count.in,ilen: Pointer to the input buffer and its length in frames.out,olen: Pointer to the output buffer and its capacity in frames.idone,odone: Pointers to store the number of frames consumed and produced.io_spec,q_spec,rt_spec: (Optional) Pointers to I/O, quality, and runtime specifications.
This helper function creates an I/O specification object.
itype,otype: The data type for the input and output buffers. Supported types areSOXR_FLOAT32(forfloat[]) andSOXR_FLOAT64(fordouble[]).
This helper function creates a quality specification object.
recipe: A preset that defines the quality level.SSRC_SOXR_QQ: "Quick" qualitySSRC_SOXR_LQ: "Low" qualitySSRC_SOXR_MQ: "Medium" quality (default)SSRC_SOXR_HQ: "High" qualitySSRC_SOXR_VHQ: "Very High" quality
flags: Additional flags (e.g., for dithering options).0is a safe default. The following values can be used.SOXR_MINIMUM_PHASE: Use minimum phase filter.
This example demonstrates how to convert a WAV file from one sample rate to another. It uses the popular dr_wav single-file library for file I/O, which is included in this repository for convenience.
#include <stdio.h>
#include <stdlib.h>
// Use the libsoxr compatibility layer
#define SSRC_LIBSOXR_EMULATION
#include "shibatch/ssrcsoxr.h"
// dr_wav for file I/O.
// In this project, a modified version `xdr_wav.h` is used.
#define DR_WAV_IMPLEMENTATION
#include "xdr_wav.h"
#define BUFFER_FRAMES 3000
int main(int argc, char *argv[]) {
if (argc < 4) {
printf("Usage: %s <input.wav> <output.wav> <new_rate>\n", argv[0]);
return 1;
}
const char* in_filename = argv[1];
const char* out_filename = argv[2];
double const out_rate = atof(argv[3]);
// 1. Open input WAV file
drwav wav_in;
if (!drwav_init_file(&wav_in, in_filename, NULL)) {
fprintf(stderr, "Failed to open input file: %s\n", in_filename);
return 1;
}
double const in_rate = (double)wav_in.sampleRate;
unsigned int const num_channels = wav_in.channels;
// 2. Configure and create the resampler
soxr_error_t error;
soxr_io_spec_t io_spec = soxr_io_spec(SOXR_FLOAT32, SOXR_FLOAT32);
soxr_quality_spec_t q_spec = soxr_quality_spec(SOXR_MQ, 0);
soxr_t soxr = soxr_create(in_rate, out_rate, num_channels, &error, &io_spec, &q_spec, NULL);
if (!soxr) {
fprintf(stderr, "soxr_create failed: %s\n", error);
drwav_uninit(&wav_in);
return 1;
}
// 3. Open output WAV file
drwav_data_format format;
format.container = drwav_container_riff;
format.format = DR_WAVE_FORMAT_IEEE_FLOAT;
format.channels = num_channels;
format.sampleRate = (drwav_uint32)out_rate;
format.bitsPerSample = 32;
drwav wav_out;
if (!drwav_init_file_write(&wav_out, out_filename, &format, NULL)) {
fprintf(stderr, "Failed to open output file: %s\n", out_filename);
soxr_delete(soxr);
drwav_uninit(&wav_in);
return 1;
}
// 4. Set up I/O buffers
float* in_buffer = (float*)malloc(sizeof(float) * BUFFER_FRAMES * num_channels);
size_t out_buffer_capacity = (size_t)(BUFFER_FRAMES * out_rate / in_rate + 0.5) + 16;
float* out_buffer = (float*)malloc(sizeof(float) * out_buffer_capacity * num_channels);
size_t frames_read;
// 5. Process data in a loop
while ((frames_read = drwav_read_pcm_frames_f32(&wav_in, BUFFER_FRAMES, in_buffer)) > 0) {
size_t frames_consumed;
size_t frames_produced;
error = soxr_process(soxr,
in_buffer, frames_read, &frames_consumed,
out_buffer, out_buffer_capacity, &frames_produced);
if (error) fprintf(stderr, "soxr_process error: %s\n", error);
if (frames_produced > 0) {
drwav_write_pcm_frames(&wav_out, frames_produced, out_buffer);
}
}
// 6. Flush the resampler's internal buffer
size_t frames_produced;
do {
error = soxr_process(soxr, NULL, 0, NULL, out_buffer, out_buffer_capacity, &frames_produced);
if (error) fprintf(stderr, "soxr_process (flush) error: %s\n", error);
if (frames_produced > 0) {
drwav_write_pcm_frames(&wav_out, frames_produced, out_buffer);
}
} while (frames_produced > 0);
// 7. Clean up
free(in_buffer);
free(out_buffer);
soxr_delete(soxr);
drwav_uninit(&wav_in);
drwav_uninit(&wav_out);
printf("Successfully created %s\n", out_filename);
return 0;
}