This repository contains an R&D project focused on the automated classification of tactical radio intercepts. The system utilizes a multi-modal AI pipeline to transcribe noisy, real-world radio communications and classify them into structured data using Large Language Models (LLMs).
Developed as a complex academic research project, this tool demonstrates the practical application of NLP and Audio Processing in military/tactical contexts.
The system operates in two main stages:
- Automatic Speech Recognition (ASR): Uses OpenAI's
whisper-large-v3-turbo(via Hugging Facetransformers) to transcribe Ukrainian audio, specifically optimized to handle severe static, background noise, and clipping typical of tactical comms. - LLM Text Classification: Employs Google's
gemini-2.5-flashviaLangChain. A strict system prompt andPydanticoutput parsers force the model to categorize the messy transcription into predefined tactical clusters (e.g., Reconnaissance, Medevac, Artillery, Logistics) and extract entities like coordinates and callsigns into a clean JSON format.
- End-to-End Processing: From raw
.wav/.oggaudio straight to a structured JSON intelligence report. - Resilient ASR: Capable of understanding heavily distorted speech (simulated radio chatter).
- Hallucination Control: Zero-temperature LLM generation with strict schema enforcement.
- Interactive UI: Built-in Gradio web interface for live audio recording and file uploads.
- DSP Dataset Generator: Includes a custom Digital Signal Processing script (
batch_radio_fx.py) to apply bandpass filters, overdrive, and white noise to clean audio, simulating gritty tactical environments.
- Python 3.10+
- FFmpeg installed on your system.
- Google Gemini API Key.
-
Clone the repository:
git clone https://github.com AlexanderSychev2005/radio-analysis.git cd radio-analysis -
Install dependencies:
pip install -r requirements.txt
-
Set your API Key:
export GOOGLE_API_KEY="your_api_key_here"
Launch the interactive web interface:
python app.pyOpen http://127.0.0.1:7860 in your browser.
To test the model, you can generate your own tactical intercepts. Place clean .wav or .ogg voice recordings in the test_audios/ folder and run:
python radio_fx.pyThis script applies mathematical filters (cutting frequencies below 300Hz and above 3000Hz, applying clipping, and adding white noise) and outputs the results to the radio_audios/ folder.
The ASR component is evaluated based on the Word Error Rate (WER). While the model may produce slight transcription errors due to induced radio static, the subsequent LLM's self-attention mechanism effectively contextualizes and corrects these discrepancies during the classification stage.
Developed for research and educational purposes.