Skip to content

yunusemrealpak/audio_transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽ™๏ธ Audio Transcriber

License: MIT Python 3.10+ PRs Welcome

English | Tรผrkรงe

A professional audio recording and transcription application for Windows. Record microphone and system audio simultaneously, automatically split into 10-minute blocks, transcribe with Gladia AI, and generate smart notes with Gemini.

โœจ Features

  • ๐ŸŽค Multi-Source Recording - Record microphone and system audio simultaneously
  • ๐Ÿ“ฆ Smart Block Management - Automatic 10-minute blocks for cost optimization
  • ๐ŸŽฎ Playback Preview - Listen to each block before transcribing
  • โœ… Flexible Selection - Choose which blocks to transcribe
  • ๐Ÿ“ Gladia Transcription - High-accuracy Turkish transcription
  • ๐Ÿค– Gemini AI Notes - Automatic note generation and summarization
  • ๐Ÿ’พ Markdown Export - Save and share notes easily
  • ๐ŸŽจ Modern UI - Professional interface built with CustomTkinter
  • ๐Ÿ—‘๏ธ Block Management - Delete unwanted blocks
  • ๐Ÿ“Š Batch Operations - Select all/none with one click
  • โฑ๏ธ Progress Tracking - Real-time recording and playback progress

๐Ÿ“ธ Screenshots

Coming soon - UI screenshots will be added

๐Ÿš€ Quick Start

Prerequisites

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/audio_transcriber.git
    cd audio_transcriber
  2. Create virtual environment

    python -m venv venv
    venv\Scripts\activate  # Windows
    source venv/bin/activate  # Linux/Mac
  3. Install dependencies

    pip install -r requirements.txt
  4. Configure API keys

    Copy .env.example to .env:

    cp .env.example .env

    Edit .env and add your API keys:

    GLADIA_API_KEY=your-gladia-key-here
    GEMINI_API_KEY=your-gemini-key-here
  5. Run the application

    python main.py

๐Ÿ“– Usage

Basic Workflow

  1. Select Audio Sources

    • Choose microphone from dropdown
    • Choose system audio (Stereo Mix/MOTIV Mix) if needed
  2. Record Audio

    • Click โบ๏ธ "Start Recording" button
    • Recording automatically splits into 10-minute blocks
    • Click โน๏ธ "Stop" when done
  3. Preview Blocks

    • Click โ–ถ๏ธ on any block card to listen
    • Progress bar shows playback status
    • Click โธ๏ธ to pause
  4. Select Blocks

    • Use checkboxes to select blocks for transcription
    • "All" button selects all blocks
    • "None" button deselects all
  5. Transcribe

    • Click "Transcribe Selected โ†’"
    • Watch progress for each block
    • View transcript in the right panel
  6. Generate Notes

    • Click ๐Ÿค– "Generate Notes with Gemini"
    • AI analyzes transcript and creates structured notes
    • Notes appear in the bottom panel
  7. Export

    • Click ๐Ÿ’พ "Save as Markdown"
    • Choose location and filename
    • Share your notes!

Enabling System Audio (Windows)

To record system audio, enable "Stereo Mix":

  1. Right-click speaker icon โ†’ Sound Settings
  2. Click Sound Control Panel โ†’ Recording tab
  3. Right-click empty space โ†’ Show Disabled Devices
  4. Right-click Stereo Mix โ†’ Enable
  5. Set as default or select in the app

โš™๏ธ Configuration

Environment Variables

Variable Description Required
GLADIA_API_KEY API key from Gladia.io Yes
GEMINI_API_KEY API key from Google AI Studio Yes

Settings (config.py)

Setting Default Description
SAMPLE_RATE 44100 Audio sample rate in Hz
BLOCK_DURATION_MINUTES 10 Recording block duration
RECORDINGS_DIR "recordings" Directory for audio files
GEMINI_MODEL "gemini-2.5-flash" Gemini model version

๐Ÿ’ฐ Cost Estimation

Service Unit Price 10 min 1 hour
Gladia ~$0.0002/sec ~$0.12 ~$0.70
Gemini Flash Free* $0 $0

*Gemini 2.5 Flash is free within daily limits.

๐Ÿ“ Project Structure

audio_transcriber/
โ”œโ”€โ”€ src/                 # Source code
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ audio_recorder.py    # Audio recording module
โ”‚   โ”œโ”€โ”€ gladia_service.py    # Gladia API integration
โ”‚   โ”œโ”€โ”€ gemini_service.py    # Gemini AI integration
โ”‚   โ””โ”€โ”€ config.py            # Configuration
โ”œโ”€โ”€ main.py              # Main application entry point
โ”œโ”€โ”€ recordings/          # Audio files (auto-created)
โ”œโ”€โ”€ requirements.txt     # Dependencies
โ”œโ”€โ”€ .env.example         # Environment variables template
โ”œโ”€โ”€ .gitignore           # Git ignore rules
โ”œโ”€โ”€ LICENSE              # MIT License
โ”œโ”€โ”€ README.md            # English documentation
โ””โ”€โ”€ README_TR.md         # Turkish documentation

๐Ÿ”ง Troubleshooting

"Microphone not found" error

  • Check default microphone in Windows Sound Settings
  • Verify microphone permissions for the application

"Loopback not found" error

  • Enable "Stereo Mix" or "Stereo KarฤฑลŸฤฑmฤฑ" in Windows:
    • Sound Settings โ†’ Recording โ†’ Right-click โ†’ Show Disabled Devices
    • Enable Stereo Mix/KarฤฑลŸฤฑmฤฑ

Gladia API errors

  • Verify API key is correct
  • Check internet connection
  • Verify credit balance in Gladia dashboard

Gemini API errors

  • Verify API key is correct
  • Check if daily limit exceeded
  • Ensure model name is correct

๐ŸŽฏ Roadmap

  • Real-time transcription
  • Speaker diarization (identify different speakers)
  • Multiple note templates
  • Automatic language detection
  • Audio quality indicator
  • Keyboard shortcuts/hotkeys
  • Multi-language support
  • Export to PDF and DOCX
  • Cloud storage integration
  • Collaboration features

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ง Contact

Yunus Emre Alpak - @yunusemrealpak

Project Link: https://github.com/yourusername/audio_transcriber


Made with โค๏ธ by Yunus Emre Alpak

About

Professional audio recording and transcription tool for Windows. Record mic + system audio, auto-split into blocks, transcribe with Gladia AI, and generate smart notes with Gemini. Features playback preview, batch processing, and modern UI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages