Python-based custom processors for Apache NiFi, specialized in audio/media ingestion and transformation pipelines.
This repository contains reusable Python scripts designed for NiFi's ExecuteDocumentPython processor (or similar scripted processors). The focus is on handling long-form audio/video files — extracting, chunking, converting, and enriching with metadata — to prepare content for transcription, search, AI analysis, or archiving.
Current highlight: A robust MPG → 30-second MP3 chunk extractor with diagnostic bypass for troubleshooting large-file/content issues.
- Efficient FFmpeg-based audio extraction
- Fixed-duration chunking (configurable)
- High-quality MP3 output (VBR)
- Rich FlowFile attributes for downstream routing/metadata
- Diagnostic modes for real-world NiFi deployment issues
- Clean temporary file handling and immediate transfer of results
File: processors/extract_mp3_chunks_diagnostic.py
Converts a single MPEG file into sequential 30-second MP3 audio segments.
- Bypasses FlowFile content reading (uses direct file path from
idol.reference) - Ideal for large files or when NiFi content claiming is problematic
- Outputs one FlowFile per chunk with attributes like start time, original source, etc.
Full detailed documentation (inline comments + usage notes in the file)
- Apache NiFi
- FFmpeg installed on NiFi host(s)
- ExecuteDocumentPython processor configured
- Clone this repo
- Copy the Python script to your NiFi script directory or load directly
- Configure an ExecuteDocumentPython processor with this script as the handler
- Ensure input FlowFiles have required attributes (
idol.reference,filename)
- Normal (non-bypass) mode version
- Configurable chunk duration/quality via attributes
- Additional processors: metadata enrichment, format validation, silence trimming
- Example NiFi flow templates
Contributions welcome! Open issues or PRs for new processors or improvements.
MIT License — see LICENSE file.
Maintained by Vinay (@josepheternity) · Melbourne, Australia