Audio: MFCC: Add Mel spectrogram mode via configuration blob#10743
Open
singalsu wants to merge 12 commits intothesofproject:mainfrom
Open
Audio: MFCC: Add Mel spectrogram mode via configuration blob#10743singalsu wants to merge 12 commits intothesofproject:mainfrom
singalsu wants to merge 12 commits intothesofproject:mainfrom
Conversation
This patch updates the data clear and copy functions in mfcc_sink_copy_zero_s16() and mfcc_sink_copy_data_s16() with memset() and memcpy() instead looping sample by sample. The function mfcc_source_copy_s16() is moved to later under CONFIG_FORMAT_S16LE where it should be. There are no changes to the function itself. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The memset() and memcpy() are as fast as HiFi data clear and copy functions so, the functions mfcc_sink_copy_zero_s16() and mfcc_sink_copy_data_s16() can be moved to mfcc_common.c. This change also will help with possible audio features output data format changes in future. The current data format as fake PCM stream may change to compress encode stream type. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add S24_4LE and S32_LE processing functions for MFCC component. The new format variants convert input samples to internal 16-bit representation for FFT processing and expand cepstral output back to the sink format. Implementations are added for generic, HiFi3, and HiFi4 architectures. The source copy functions handle pre-emphasis filtering with the format conversion. The sink copy functions write 16-bit cepstral coefficients expanded to the 32-bit container format. The MFCC magic marker is written directly as a raw 32-bit value without format conversion. The function map in mfcc.c is updated to wire the new processing functions for S24_4LE and S32_LE formats. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The configuration blob uses value -1 for for input channel select with mono format. This patch adds an error if the -1 is used for other than mono input stream. The low-information comp_info() trace print is moved a to better error message. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add a mode where cepstral coefficients are not computed and the Mel frequency logarithm values are passed directly to the sink buffer. The mode is activated when sof_mfcc_config member num_ceps is set to zero. When num_ceps is zero: - DCT matrix and cepstral lifter are not allocated or initialized - The Mel log spectra (num_mel_bins values) are output to the sink instead of cepstral coefficients - A mel_only flag is added to mfcc_state for runtime path selection This is useful for applications that need Mel spectrogram features without the DCT transform, such as some neural network audio front-ends. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This change allows to have more than e.g. 30 ceps or Mel values plus magic sync value number in a single stereo 16 kHz 16 bit period. As much data can be packed as the FFT hop size and used sink format allows. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The description for top_db was was wrong. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The support for Hann window was missing from MFCC setup function. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
For compatibility with OpenVINO Whisper audio features this patch adds to function mfcc_stft_process() peak tracking of Mel spectra maximum in mel_only mode and clamp of Mel spectral values to found maximum minus config->top_db. The parameters for peak tracking and clamping are set via the configuration blob. The whisper audio features like absolute max behavior can be achieved with a mmax_coef zero. Then the mmax values rises to detected peak and remains there. The patch also adds normalization of Mel values with a configurable offset and scale. Whisper uses hard-coded values but making them configuration parameters from the blob is more flexible. The input parameter state is changed to struct mfcc_comp_data *cd to be able to access both state and configuration for the module. The ABI header user/mfcc.h is modified in a way that previous default operation for cepstral coefficients is not impacted. The new Mel only mode uses the added previous reserved fields in the configuration blob. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
There are several changes: - The topology v1 format blob export is removed. It updates the MFCC module blob default.conf and adds a new blob mel_spectrogram.conf for topology v2. - The script is organized to be able to output multiple blobs. - The topology sof-hda-benchmark-mfcc16/24/32.tplg is using stereo data format, so the blob configuration -1 for channels to assume mono is wrong in setup_mfcc.m. - A blob for Mel frequency scale logarithic spectrum output is added. It sets num_ceps to zero to indicate Mel mode for MFCC. The parameters are set for Whisper compatible audio features with 80 Mel bins, Hann -window, FFT size 400 (padded to 512) with hop of 160. - The missing export of mel_log (log/log10/db) and norm parameters (none/slaney) is added. - Parameters are added for compability with OpenVINO's Whisper audio features extractor. The Mel values are clamped vs. tracked Mel values maximum and existing top_db parameter and normalized with a configurable offset and scale. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds build of test topologies to test OpenVINO Whisper audio features extractor compatible setup for SOF MFCC. The topology names are sof-hda-benchmark-mfccmel16/24/32.tplg. The MFCC module is initialized to produce spectrogram data for 80 Mel frequency bands. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch contains several updates: - The run is with valgrind is added to catch memory leaks. - The script applied duplicate "-i" and "-o" arguments. They are removed from "OPT" variables. - The sof-testbench4 can't override the channels count in topology similarly as the IPC3 testbench could. Since the current topology is for stereo 16 kHz the input data and command line must be for such too. - To be able to compare MFCC output for successive runs, the "-R" option is added to run of sox audio convert utility to prevent e.g. randomization of dither. - The script converts input to s24 and s32 formats and runs them for easier check for correct operation with supported formats. The conversion is done from the s16 version to be able to compare the output audio features those should be the same if internal processing is 16 bit. - A run with Mel configured MFCC is added for s16/24/32 formats. - A script to decode and visualize Mel spectrogram data is added as decode_mel.m. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the SOF MFCC component and its topology/tuning assets to support a “Mel spectrogram” (Mel-only) output mode via configuration blob selection, aiming for compatibility with OpenVINO Whisper extract_features()-style feature data.
Changes:
- Added a new MFCC configuration blob (
mel80) and wired it into benchmark topology controls/targets to generate Mel-only benchmark topologies. - Updated MFCC firmware config ABI (
sof_mfcc_config) and tuning/export scripts to generate both default and Mel-only blobs. - Implemented Mel-only processing path (skip DCT/lifter) plus added S24/S32 frame-format support for MFCC processing.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/topology/topology2/include/components/mfcc/mel80.conf | Adds a new exported MFCC config blob for 80-bin Mel-only output. |
| tools/topology/topology2/include/components/mfcc/default.conf | Updates the default MFCC blob header and content to the new config layout. |
| tools/topology/topology2/include/bench/mfccmel_s32.conf | Adds a bench config include for MFCC Mel-only mode (S32). |
| tools/topology/topology2/include/bench/mfccmel_s24.conf | Adds a bench config include for MFCC Mel-only mode (S24). |
| tools/topology/topology2/include/bench/mfccmel_s16.conf | Adds a bench config include for MFCC Mel-only mode (S16). |
| tools/topology/topology2/include/bench/mfcc_controls_playback.conf | Extends MFCC bytes control to select the mel80 blob. |
| tools/topology/topology2/include/bench/mfcc_controls_capture.conf | Extends MFCC bytes control to select the mel80 blob. |
| tools/topology/topology2/development/tplg-targets-bench.cmake | Adds benchmark target generation for mfccmel* configurations. |
| tools/topology/topology2/cavs-benchmark-hda.conf | Registers new mfccmel16/24/32 bench configs. |
| src/include/user/mfcc.h | Extends MFCC config struct with Mel-only normalization/clamp parameters and dynamic_mmax. |
| src/include/sof/audio/mfcc/mfcc_comp.h | Adds MFCC state needed for Mel-only mode, output buffering, and S24/S32 function declarations. |
| src/audio/mfcc/tune/setup_mfcc.m | Refactors blob generation to emit both default and Mel-only blobs; adds enum mapping helpers. |
| src/audio/mfcc/tune/run_mfcc.sh | Updates tuning script to run MFCC and Mel-only benchmarks for S16/S24/S32. |
| src/audio/mfcc/tune/README.txt | Updates tuning README to reflect new output filenames and Mel visualization entry point. |
| src/audio/mfcc/tune/decode_mel.m | Adds a decoder/plotter for Mel-only stream output (raw/wav). |
| src/audio/mfcc/mfcc.c | Enables MFCC processing for S24 and S32 stream formats. |
| src/audio/mfcc/mfcc_setup.c | Adds Hann window support and introduces Mel-only setup path (skip DCT/lifter); adds drain-capacity check and output-state init. |
| src/audio/mfcc/mfcc_hifi4.c | Adds S24/S32 source copy implementations for HiFi4 builds. |
| src/audio/mfcc/mfcc_hifi3.c | Adds S24/S32 source copy implementations for HiFi3 builds. |
| src/audio/mfcc/mfcc_generic.c | Adds/relocates S16 source copy and implements generic S24/S32 source copy paths. |
| src/audio/mfcc/mfcc_common.c | Implements Mel-only post-Mel processing (dynamic max + clamp/scale) and adds S24/S32 MFCC processing functions with multi-period output support. |
Comment on lines
+372
to
+380
| if (to_copy > 0) { | ||
| w_ptr = mfcc_sink_copy_data_s32(sink, w_ptr, to_copy, | ||
| (int32_t *)state->out_data_ptr); | ||
| state->out_data_ptr += to_copy * 2; | ||
| state->out_remain -= to_copy * 2; | ||
| if (state->out_remain < 0) | ||
| state->out_remain = 0; | ||
|
|
||
| sink_samples -= to_copy; |
Collaborator
Author
There was a problem hiding this comment.
I'll change the Mel values data to 32 bit. It will avoid this issue and improve a lot the accuracy. The normalized log10 Mel values for Whisper in about -1 to +1 range consumed only a part of 16 bit Q9.7 format.
Comment on lines
+435
to
+441
| w_ptr = mfcc_sink_copy_data_s32(sink, w_ptr, to_copy, | ||
| (int32_t *)state->out_data_ptr); | ||
| state->out_data_ptr += to_copy * 2; | ||
| state->out_remain -= to_copy * 2; | ||
| if (state->out_remain < 0) | ||
| state->out_remain = 0; | ||
|
|
Comment on lines
+30
to
+43
| idx1 = find(data == magic(1)); | ||
| idx = []; | ||
| for i = 1:length(idx1) | ||
| if data(idx1(i) + 1) == magic(2) | ||
| idx = [idx idx1(i)]; | ||
| end | ||
| end | ||
|
|
||
| if isempty(idx) | ||
| error('No magic value markers found from stream'); | ||
| end | ||
|
|
||
| period_mel = idx(2)-idx(1); | ||
| num_frames = length(idx); |
| data = int16(zeros(prod(s), 1)); | ||
| for i = 1:num_channels | ||
| data(i:num_channels:end) = tmp(:, i); | ||
| end |
| cfg.blackman_coef = 0.42; | ||
| cfg.cepstral_lifter = 22.0; | ||
| cfg.channel = -1; % -1 expect mono, 0 left, 1 right ... | ||
| cfg.channel = 0; % -1 expect mono, 0 left, 1 right ... |
| struct mat_matrix_16b *mel_spectra; /**< Pointer to scratch */ | ||
| struct mat_matrix_16b *cepstral_coef; /**< Pointer to scratch */ | ||
| int32_t *power_spectra; /**< Pointer to scratch */ | ||
| int16_t mmax; /**< Maximum Mel value in Q9.7 */ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change adds compatibility with audio features data produced by function extract_features() in OpenVINO Whisper library. The work is not yet complete but this is a suitable step for a PR proposal before more changes (e.g. compute 32 bit Mel spectrum) to fine tune the data for more accurate match.