The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
-
Updated
Sep 18, 2025 - Python
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
Sound event localization, detection, and tracking of multiple overlapping and moving sources in 2D spherical space using convolutional recurrent neural network
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Reading list for research topics in Sound AI
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
This is the public repository for eigenvector-based SALSA features for polyphonic sound event localization and detection.
OpenFLAM: Framewise Language Audio Model
SELD-TCN: Sound Event Detection & Localization via Temporal Convolutional Network | Python w/ Tensorflow
2024 Latest laughter detection & segmentaion model. Paper: "Robust Laughter Segmentation with Automatic Diverse Data Synthesis", Interspeech 2024
Baseline of dcase 2019 task 4
Sound event detection with depthwise separable and dilated convolutions.
Training code of Cornell Birdcall Identification Challenge 6th place solution
🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.
Author's repository for reproducing DcaseNet, an integrated pre-trained DNN that performs acoustic scene classification, audio tagging, and sound event detection. Implemented using PyTorch.
Python library for rapid prototyping of environmental sound analysis systems
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection (ICASSP 2024)
Tracking states of the arts and recent results (bibliography) on sound tasks.
📊 Easily apply audio-related machine learning models trained on the AudioSet dataset (527+ models/classes).
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Easy to use Audio Tagging in PyTorch
Add a description, image, and links to the sound-event-detection topic page so that developers can more easily learn about it.
To associate your repository with the sound-event-detection topic, visit your repo's landing page and select "manage topics."