Audio: MFCC: Add Mel spectrogram mode via configuration blob by singalsu · Pull Request #10743 · thesofproject/sof

singalsu · 2026-05-05T14:00:09Z

This change adds compatibility with audio features data produced by function extract_features() in OpenVINO Whisper library. The work is not yet complete but this is a suitable step for a PR proposal before more changes (e.g. compute 32 bit Mel spectrum) to fine tune the data for more accurate match.

This patch updates the data clear and copy functions in mfcc_sink_copy_zero_s16() and mfcc_sink_copy_data_s16() with memset() and memcpy() instead looping sample by sample. The function mfcc_source_copy_s16() is moved to later under CONFIG_FORMAT_S16LE where it should be. There are no changes to the function itself. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The memset() and memcpy() are as fast as HiFi data clear and copy functions so, the functions mfcc_sink_copy_zero_s16() and mfcc_sink_copy_data_s16() can be moved to mfcc_common.c. This change also will help with possible audio features output data format changes in future. The current data format as fake PCM stream may change to compress encode stream type. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add S24_4LE and S32_LE processing functions for MFCC component. The new format variants convert input samples to internal 16-bit representation for FFT processing and expand cepstral output back to the sink format. Implementations are added for generic, HiFi3, and HiFi4 architectures. The source copy functions handle pre-emphasis filtering with the format conversion. The sink copy functions write 16-bit cepstral coefficients expanded to the 32-bit container format. The MFCC magic marker is written directly as a raw 32-bit value without format conversion. The function map in mfcc.c is updated to wire the new processing functions for S24_4LE and S32_LE formats. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The configuration blob uses value -1 for for input channel select with mono format. This patch adds an error if the -1 is used for other than mono input stream. The low-information comp_info() trace print is moved a to better error message. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add a mode where cepstral coefficients are not computed and the Mel frequency logarithm values are passed directly to the sink buffer. The mode is activated when sof_mfcc_config member num_ceps is set to zero. When num_ceps is zero: - DCT matrix and cepstral lifter are not allocated or initialized - The Mel log spectra (num_mel_bins values) are output to the sink instead of cepstral coefficients - A mel_only flag is added to mfcc_state for runtime path selection This is useful for applications that need Mel spectrogram features without the DCT transform, such as some neural network audio front-ends. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

This change allows to have more than e.g. 30 ceps or Mel values plus magic sync value number in a single stereo 16 kHz 16 bit period. As much data can be packed as the FFT hop size and used sink format allows. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The description for top_db was was wrong. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The support for Hann window was missing from MFCC setup function. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

For compatibility with OpenVINO Whisper audio features this patch adds to function mfcc_stft_process() peak tracking of Mel spectra maximum in mel_only mode and clamp of Mel spectral values to found maximum minus config->top_db. The parameters for peak tracking and clamping are set via the configuration blob. The whisper audio features like absolute max behavior can be achieved with a mmax_coef zero. Then the mmax values rises to detected peak and remains there. The patch also adds normalization of Mel values with a configurable offset and scale. Whisper uses hard-coded values but making them configuration parameters from the blob is more flexible. The input parameter state is changed to struct mfcc_comp_data *cd to be able to access both state and configuration for the module. The ABI header user/mfcc.h is modified in a way that previous default operation for cepstral coefficients is not impacted. The new Mel only mode uses the added previous reserved fields in the configuration blob. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

There are several changes: - The topology v1 format blob export is removed. It updates the MFCC module blob default.conf and adds a new blob mel_spectrogram.conf for topology v2. - The script is organized to be able to output multiple blobs. - The topology sof-hda-benchmark-mfcc16/24/32.tplg is using stereo data format, so the blob configuration -1 for channels to assume mono is wrong in setup_mfcc.m. - A blob for Mel frequency scale logarithic spectrum output is added. It sets num_ceps to zero to indicate Mel mode for MFCC. The parameters are set for Whisper compatible audio features with 80 Mel bins, Hann -window, FFT size 400 (padded to 512) with hop of 160. - The missing export of mel_log (log/log10/db) and norm parameters (none/slaney) is added. - Parameters are added for compability with OpenVINO's Whisper audio features extractor. The Mel values are clamped vs. tracked Mel values maximum and existing top_db parameter and normalized with a configurable offset and scale. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

This patch adds build of test topologies to test OpenVINO Whisper audio features extractor compatible setup for SOF MFCC. The topology names are sof-hda-benchmark-mfccmel16/24/32.tplg. The MFCC module is initialized to produce spectrogram data for 80 Mel frequency bands. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

This patch contains several updates: - The run is with valgrind is added to catch memory leaks. - The script applied duplicate "-i" and "-o" arguments. They are removed from "OPT" variables. - The sof-testbench4 can't override the channels count in topology similarly as the IPC3 testbench could. Since the current topology is for stereo 16 kHz the input data and command line must be for such too. - To be able to compare MFCC output for successive runs, the "-R" option is added to run of sox audio convert utility to prevent e.g. randomization of dither. - The script converts input to s24 and s32 formats and runs them for easier check for correct operation with supported formats. The conversion is done from the s16 version to be able to compare the output audio features those should be the same if internal processing is 16 bit. - A run with Mel configured MFCC is added for s16/24/32 formats. - A script to decode and visualize Mel spectrogram data is added as decode_mel.m. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Copilot

Pull request overview

This PR extends the SOF MFCC component and its topology/tuning assets to support a “Mel spectrogram” (Mel-only) output mode via configuration blob selection, aiming for compatibility with OpenVINO Whisper extract_features()-style feature data.

Changes:

Added a new MFCC configuration blob (mel80) and wired it into benchmark topology controls/targets to generate Mel-only benchmark topologies.
Updated MFCC firmware config ABI (sof_mfcc_config) and tuning/export scripts to generate both default and Mel-only blobs.
Implemented Mel-only processing path (skip DCT/lifter) plus added S24/S32 frame-format support for MFCC processing.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tools/topology/topology2/include/components/mfcc/mel80.conf	Adds a new exported MFCC config blob for 80-bin Mel-only output.
tools/topology/topology2/include/components/mfcc/default.conf	Updates the default MFCC blob header and content to the new config layout.
tools/topology/topology2/include/bench/mfccmel_s32.conf	Adds a bench config include for MFCC Mel-only mode (S32).
tools/topology/topology2/include/bench/mfccmel_s24.conf	Adds a bench config include for MFCC Mel-only mode (S24).
tools/topology/topology2/include/bench/mfccmel_s16.conf	Adds a bench config include for MFCC Mel-only mode (S16).
tools/topology/topology2/include/bench/mfcc_controls_playback.conf	Extends MFCC bytes control to select the `mel80` blob.
tools/topology/topology2/include/bench/mfcc_controls_capture.conf	Extends MFCC bytes control to select the `mel80` blob.
tools/topology/topology2/development/tplg-targets-bench.cmake	Adds benchmark target generation for `mfccmel*` configurations.
tools/topology/topology2/cavs-benchmark-hda.conf	Registers new `mfccmel16/24/32` bench configs.
src/include/user/mfcc.h	Extends MFCC config struct with Mel-only normalization/clamp parameters and `dynamic_mmax`.
src/include/sof/audio/mfcc/mfcc_comp.h	Adds MFCC state needed for Mel-only mode, output buffering, and S24/S32 function declarations.
src/audio/mfcc/tune/setup_mfcc.m	Refactors blob generation to emit both default and Mel-only blobs; adds enum mapping helpers.
src/audio/mfcc/tune/run_mfcc.sh	Updates tuning script to run MFCC and Mel-only benchmarks for S16/S24/S32.
src/audio/mfcc/tune/README.txt	Updates tuning README to reflect new output filenames and Mel visualization entry point.
src/audio/mfcc/tune/decode_mel.m	Adds a decoder/plotter for Mel-only stream output (raw/wav).
src/audio/mfcc/mfcc.c	Enables MFCC processing for S24 and S32 stream formats.
src/audio/mfcc/mfcc_setup.c	Adds Hann window support and introduces Mel-only setup path (skip DCT/lifter); adds drain-capacity check and output-state init.
src/audio/mfcc/mfcc_hifi4.c	Adds S24/S32 source copy implementations for HiFi4 builds.
src/audio/mfcc/mfcc_hifi3.c	Adds S24/S32 source copy implementations for HiFi3 builds.
src/audio/mfcc/mfcc_generic.c	Adds/relocates S16 source copy and implements generic S24/S32 source copy paths.
src/audio/mfcc/mfcc_common.c	Implements Mel-only post-Mel processing (dynamic max + clamp/scale) and adds S24/S32 MFCC processing functions with multi-period output support.

singalsu · 2026-05-05T16:44:28Z

+	if (to_copy > 0) {
+		w_ptr = mfcc_sink_copy_data_s32(sink, w_ptr, to_copy,
+						(int32_t *)state->out_data_ptr);
+		state->out_data_ptr += to_copy * 2;
+		state->out_remain -= to_copy * 2;
+		if (state->out_remain < 0)
+			state->out_remain = 0;
+
+		sink_samples -= to_copy;


I'll change the Mel values data to 32 bit. It will avoid this issue and improve a lot the accuracy. The normalized log10 Mel values for Whisper in about -1 to +1 range consumed only a part of 16 bit Q9.7 format.

+		w_ptr = mfcc_sink_copy_data_s32(sink, w_ptr, to_copy,
+						(int32_t *)state->out_data_ptr);
+		state->out_data_ptr += to_copy * 2;
+		state->out_remain -= to_copy * 2;
+		if (state->out_remain < 0)
+			state->out_remain = 0;
+


+idx1 = find(data == magic(1));
+idx = [];
+for i = 1:length(idx1)
+	if data(idx1(i) + 1) == magic(2)
+		idx = [idx idx1(i)];
+	end
+end
+
+if isempty(idx)
+	error('No magic value markers found from stream');
+end
+
+period_mel = idx(2)-idx(1);
+num_frames = length(idx);


+			data = int16(zeros(prod(s), 1));
+			for i = 1:num_channels
+				data(i:num_channels:end) = tmp(:, i);
+			end


 	cfg.blackman_coef = 0.42;
 	cfg.cepstral_lifter = 22.0;
-	cfg.channel = -1; % -1 expect mono, 0 left, 1 right ...
+	cfg.channel = 0; % -1 expect mono, 0 left, 1 right ...


 	struct mat_matrix_16b *mel_spectra; /**< Pointer to scratch */
 	struct mat_matrix_16b *cepstral_coef; /**< Pointer to scratch */
 	int32_t *power_spectra; /**< Pointer to scratch */
+	int16_t mmax; /**< Maximum Mel value in Q9.7 */


singalsu added 12 commits May 5, 2026 15:28

Audio: MFCC: Update user/mfcc.h comment

17dede7

The description for top_db was was wrong. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Audio: MFCC: Add setup of Hann window

2ee9bed

The support for Hann window was missing from MFCC setup function. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

singalsu marked this pull request as ready for review May 5, 2026 14:33

Copilot AI review requested due to automatic review settings May 5, 2026 14:33

singalsu requested review from a team, dbaluta, jsarha, kv2019i, lbetlej, lgirdwood, mmaka1, plbossart and ranj063 as code owners May 5, 2026 14:33

Copilot started reviewing on behalf of singalsu May 5, 2026 14:34 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio: MFCC: Add Mel spectrogram mode via configuration blob#10743

Audio: MFCC: Add Mel spectrogram mode via configuration blob#10743
singalsu wants to merge 12 commits intothesofproject:mainfrom
singalsu:mfcc_add_s32_and_mel

singalsu commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

singalsu May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singalsu commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

singalsu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants