Skip to content

davidebriani/scriba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Scriba

Real-time speech transcription tool

Scriba is a cross-platform speech recognition tool that transcribes audio in real-time and can automatically type the recognized text.

✨ Features

  • Real-time transcription using the Vosk speech recognition engine
  • 25+ language support including English, Chinese, Russian, German, French, Spanish, Japanese, and many more
  • Software engineering optimization - converts spoken numbers to digits and recognizes programming terms
  • Automatic typing - simulates keyboard input to type transcribed text wherever you focus
  • Cross-platform support - Linux, macOS, and Windows
  • Multiple model sizes - from compact 30MB models to high-accuracy 2GB+ models
  • Interactive model selection with automatic downloading
  • Configurable confidence thresholds to filter out uncertain transcriptions
  • XDG-compliant configuration - stores models and settings in ~/.config/scriba/

πŸš€ Installation

Option 1: Download Pre-built Binaries

Download the latest release for your platform from the GitHub releases page:

  • scriba-linux-x86_64.tar.gz - Linux (Intel/AMD 64-bit)
  • scriba-linux-aarch64.tar.gz - Linux (ARM 64-bit, e.g., Raspberry Pi)
  • scriba-macos-x86_64.tar.gz - macOS (Intel)
  • scriba-macos-aarch64.tar.gz - macOS (Apple Silicon)
  • scriba-windows-x86_64.zip - Windows (64-bit)

Extract the archive and run the scriba binary directly.

Option 2: Using Nix (Recommended for Nix users)

# Run directly
nix run github:davidebriani/scriba

# Install to profile
nix profile install github:davidebriani/scriba

# Use in a development shell
nix develop github:davidebriani/scriba

Option 3: Build from Source

Prerequisites

Linux (Ubuntu/Debian):

sudo apt-get install libasound2-dev libpulse-dev libxdo-dev pkg-config build-essential curl unzip

Linux (Fedora/RHEL):

sudo dnf install alsa-lib-devel pulseaudio-libs-devel libxdo-devel pkg-config gcc curl unzip

macOS:

# Install Xcode command line tools
xcode-select --install

Windows:

# Install Visual Studio Build Tools or Visual Studio Community
# Rust will automatically detect and use the MSVC toolchain

Build Instructions

  1. Clone the repository:

    git clone https://github.com/davidebriani/scriba.git
    cd scriba
  2. Install Rust (if not already installed):

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
  3. Build and install:

    ./install.sh

    This script will:

    • Build Scriba in release mode
    • Download and install the Vosk native library
    • Install the binary to ~/.local/bin/scriba
    • Set up the environment

🎯 Usage

Basic Usage

# Start with interactive model selection
scriba

# Transcription only (no automatic typing)
scriba --no-typing

# Enable debug output to see partial transcriptions
scriba --debug

# Use a specific confidence threshold (0.0-1.0)
scriba --confidence-threshold 0.8

# Force model selection even if a model exists
scriba --select-model

# Show all options
scriba --help

Configuration

Scriba stores its configuration and downloaded models in:

  • Linux/macOS: ~/.config/scriba/
  • Windows: %APPDATA%\scriba\

The first time you run Scriba, you'll be prompted to select a speech recognition model. Models are automatically downloaded and cached for future use.

Available Models

Scriba supports 25+ languages with different model sizes:

English

  • Small English US (40MB) - Fast, basic vocabulary
  • English US (Recommended) (128MB) - Best balance of speed and accuracy
  • Large English US (1.8GB) - Highest accuracy
  • English US (GigaSpeech) (2.3GB) - Latest model with improved accuracy
  • English India (1GB) - Optimized for Indian accents

Other Languages

  • Chinese (Standard & Small)
  • Russian (Standard & Small)
  • German (Standard & Small)
  • French (Small)
  • Spanish (Standard & Small)
  • Portuguese (Standard & Small)
  • Italian (Standard & Small)
  • Dutch (Standard & Small)
  • Japanese (Standard & Small)
  • Korean (Small)
  • Hindi (Standard & Small)
  • Ukrainian (Standard & Small)
  • Arabic, Persian, Turkish, Vietnamese, Polish, Gujarati

Software Engineering Features

Scriba automatically converts spoken programming terms:

  • Numbers: "one thousand twenty five" β†’ "1025"
  • Digits: "zero through nine" β†’ "0" through "9"
  • Programming terms:
    • "open paren" β†’ "("
    • "close paren" β†’ ")"
    • "open bracket" β†’ "["
    • "close bracket" β†’ "]"
    • "open brace" β†’ "{"
    • "close brace" β†’ "}"
    • "semicolon" β†’ ";"
    • "equals" β†’ "="
    • "null" β†’ "null"
    • "true" β†’ "true"
    • "false" β†’ "false"

πŸ”§ Development

Building with Nix

# Enter development shell
nix develop

# Build the project
cargo build

# Run tests
cargo test

# Build for release
cargo build --release

Manual Development Setup

  1. Install dependencies (see Prerequisites above)

  2. Set up Vosk library:

    # The build script will help you set this up
    ./build.sh
  3. Build and run:

    cargo build
    cargo run -- --help

Cross-compilation

The project supports cross-compilation for multiple architectures. See the GitHub Actions workflow (.github/workflows/release.yml) for examples of building for different targets.

πŸ—οΈ Architecture

  • Audio Capture: Uses cpal for cross-platform audio input
  • Speech Recognition: Vosk engine with downloadable models
  • Text Processing: Enhanced number and programming term conversion
  • Keyboard Simulation: enigo for cross-platform input simulation
  • Async Runtime: Tokio for concurrent audio processing and transcription
  • CLI Interface: clap for argument parsing and dialoguer for interactive prompts

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Guidelines

  1. Follow the existing code style
  2. Add tests for new functionality
  3. Update documentation as needed
  4. Ensure cross-platform compatibility

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Vosk - Open source speech recognition toolkit
  • cpal - Cross-platform audio I/O library
  • enigo - Cross-platform input simulation
  • Tokio - Asynchronous runtime for Rust

πŸ› Troubleshooting

Linux Issues

"No input device available"

# Check audio devices
arecord -l

# Install ALSA/PulseAudio development packages
sudo apt-get install libasound2-dev libpulse-dev

"libvosk.so not found"

# Run the install script to set up Vosk library
./install.sh

# Or manually set the library path
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

macOS Issues

Permission denied for microphone

  • Go to System Preferences β†’ Security & Privacy β†’ Privacy β†’ Microphone
  • Add Terminal or your terminal application to the allowed list

Windows Issues

"The system cannot find the specified module"

  • Ensure libvosk.dll is in the same directory as scriba.exe or in your system PATH
  • Install Visual C++ Redistributable if needed

For more help, please open an issue on GitHub.

About

Transcribe audio in real time. Dictate instead of typing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors