Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/locales/es/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Para mantener nuestros estándares "State of the Art 2026", seguimos reglas estr
2. **Fork & Branch**:
- Haz fork del repositorio.
- Crea una rama descriptiva: `feat/nuevo-soporte-gpu` o `fix/error-transcripcion`.
3. **Desarrollo Local**: Sigue la guía de [Instalación](instalacion.md) para configurar tu entorno de desarrollo.
3. **Desarrollo Local**: Sigue la guía de [Instalación](installation.md) para configurar tu entorno de desarrollo.

---

Expand Down
44 changes: 44 additions & 0 deletions LEEME.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# 🗣️ voice2machine

_dictado por voz para cualquier campo de texto en tu SO_

---

[🇺🇸 English](README.md) | [🇪🇸 Español](LEEME.md)

---

## qué es esto

Una herramienta que convierte tu voz a texto utilizando tu GPU local.

La premisa es simple: hablar es más rápido que escribir. Este proyecto te permite dictar en cualquier aplicación sin depender de servicios en la nube.

## filosofía

- **local-first**: tu audio nunca sale de tu máquina
- **modular**: responsabilidades separadas (daemon, api, clientes)
- **impulsado por gpu**: velocidad de transcripción usando WHISPER localmente

## documentación

**[📚 Leer la Documentación Completa](https://zarvent.github.io/voice2machine/)**

Todo lo que necesitas saber está ahí:
* Instalación y Configuración
* Arquitectura y API
* Solución de problemas

## cómo funciona

El sistema se ejecuta como un **Demonio en Segundo Plano** que expone una **API REST FastAPI** en `localhost:8765`.

```mermaid
flowchart LR
A[🎤 Grabar] --> B{Whisper} --> C[📋 Portapapeles]
D[📋 Copiar] --> E{LLM} --> F[📋 Reemplazar]
```

## licencia

Este proyecto está licenciado bajo la **GNU General Public License v3.0** - mira el archivo [LICENSE](LICENSE) para más detalles.
54 changes: 14 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,67 +4,41 @@ _voice dictation for any text field in your OS_

---

[🇺🇸 English](README.md) | [🇪🇸 Español](LEEME.md)

---

## what is this

A tool that converts your voice to text using your local GPU.

The premise is simple: speaking is faster than typing. This project allows you to dictate in any application without depending on cloud services.

---

## philosophy

- **local-first**: your audio never leaves your machine
- **modular**: started as a script, now it's an app with separated responsibilities
- **modular**: separated responsibilities (daemon, api, clients)
- **gpu-powered**: transcription speed using WHISPER locally

---

## how it works

The system runs as a **Background Daemon** that exposes a **FastAPI REST API** on `localhost:8765`.

| component | role |
| ----------- | -------------------------------------------------------------------------------------- |
| `daemon` | Handles audio recording, Whisper transcription, and LLM processing via REST endpoints. |
| `shortcuts` | Global keyboard shortcuts that send HTTP requests to the daemon. |

---

## documentation

All technical info is in `/docs` (consolidated in Spanish):

- [installation](docs/es/instalacion.md)
- [architecture](docs/es/arquitectura.md)
- [configuration](docs/es/configuracion.md)
- [keyboard shortcuts](docs/es/atajos_teclado.md) ⌨️
- [troubleshooting](docs/es/troubleshooting.md)

---
**[📚 Read the Full Documentation](https://zarvent.github.io/voice2machine/)**

## visual flows
Everything you need to know is there:
* Installation & Setup
* Configuration
* Architecture & API

### voice → text

```mermaid
flowchart LR
A[🎤 record] --> B{whisper}
B --> C[📋 clipboard]
```
## how it works

### text → improved text
The system runs as a **Background Daemon** that exposes a **FastAPI REST API** on `localhost:8765`.

```mermaid
flowchart LR
A[📋 copy] --> B{LLM}
B --> C[📋 replace]
A[🎤 Record] --> B{Whisper} --> C[📋 Clipboard]
D[📋 Copy] --> E{LLM} --> F[📋 Replace]
```

> if you don't see the diagrams, you need a mermaid extension

---

## license

This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](LICENSE) file for more details.
94 changes: 93 additions & 1 deletion docs/docs/en/changelog.md
Original file line number Diff line number Diff line change
@@ -1 +1,93 @@
{% include "../../../CHANGELOG.md" %}
---
title: Changelog
description: Change log for the Voice2Machine project.
ai_context: "Versions, Change History, SemVer"
depends_on: []
status: stable
---

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0] - 2026-01-23

### Added

- **Feature-Based Architecture**: Total restructuring into self-contained modules in `features/` (audio, llm, transcription).
- **Orchestration via Workflows**: Introduction of `RecordingWorkflow` and `LLMWorkflow` to decouple business logic from the monolithic legacy Orchestrator.
- **Strict Protocols**: Implementation of `typing.Protocol` for all internal services, allowing easy swapping of providers.
- **Modular API**: Package structure in `api/` with separate routes and schemas.

### Changed

- **Elimination of Orchestrator**: `services/orchestrator.py` has been decomposed and removed.
- **Infrastructure Refactoring**: The `infrastructure/` folder has been integrated into each corresponding `feature`.
- **Core and Domain**: Simplified and moved to `shared/` and local interfaces.

### Removed

- **Legacy Audio Tests**: Removal of obsolete tests for the Rust extension.
- **System Monitor**: System telemetry removed for core simplification.

## [0.2.0] - 2025-01-20

### Added

- **FastAPI REST API**: New HTTP API replacing the Unix Sockets-based IPC system
- **WebSocket streaming**: `/ws/events` endpoint for real-time provisional transcription
- **Swagger documentation**: Interactive UI at `/docs` for testing endpoints
- **Orchestrator pattern**: New coordination pattern that simplifies workflow
- **Rust audio engine**: Native `v2m_engine` extension for low-latency audio capture
- **MkDocs documentation system**: Structured documentation with Material theme

### Changed

- **Simplified architecture**: From CQRS/CommandBus to more direct Orchestrator pattern
- **Communication**: From binary Unix Domain Sockets to standard HTTP REST
- **State model**: Centralized management in `DaemonState` with lazy initialization
- Updated README.md with new architecture

### Removed

- `daemon.py`: Replaced by `api.py` (FastAPI)
- `client.py`: No longer needed, use `curl` or any HTTP client
- Binary IPC protocol: Replaced by standard JSON

### Fixed

- Startup latency: Server starts in ~100ms, model loads in background
- Memory leaks in WebSocket connections

## [Unreleased]

### Added

- **Hallucination Detection**: Heuristic filters and quality parameters (`no_speech`, `compression_ratio`) in `StreamingTranscriber` to reduce erroneous Whisper outputs.
- **Performance Metrics**: Inference latency tracking in logs for detailed diagnostics.

### Changed

- **VAD Optimization**: Adjusted default threshold to 0.4 to reduce false positives from ambient noise and breathing.
- **Memory Management**: Forced CUDA cache reset (`torch.cuda.empty_cache()`) when unloading models to effectively free VRAM.
- **Code Hygiene**: Import refactoring and linting error fixes (`ruff`) throughout the backend codebase.

### Planned

- Support for multiple simultaneous transcription languages
- Web dashboard for real-time monitoring
- Integration with more LLM providers

## [0.1.0] - 2024-03-20

### Added

- **Initial Voice2Machine system version**
- Local transcription support with Whisper (faster-whisper)
- Basic LLM integration (Ollama/Gemini)
- Unix Domain Sockets-based IPC system
- Hexagonal architecture with ports and adapters
- TOML-based configuration
85 changes: 84 additions & 1 deletion docs/docs/en/contributing.md
Original file line number Diff line number Diff line change
@@ -1 +1,84 @@
{% include "../../../.github/CONTRIBUTING.md" %}
---
title: Contributing Guide
description: Instructions and standards for collaborating on Voice2Machine.
status: stable
last_update: 2026-01-23
language: US English
---

# ❤️ Contributing Guide

Thank you for your interest in contributing to **Voice2Machine**! This project is built on collaboration and quality code.

To maintain our "State of the Art 2026" standards, we follow strict but fair rules. Please read this before submitting your first Pull Request.

---

## 🚀 Workflow

1. **Discussion First**: Before writing code, open an [Issue](https://github.com/v2m-lab/voice2machine/issues) to discuss the change. This avoids duplicate work or rejections due to architectural misalignment.
2. **Fork & Branch**:
- Fork the repository.
- Create a descriptive branch: `feat/new-gpu-support` or `fix/transcription-error`.
3. **Local Development**: Follow the [Installation](installation.md) guide to set up your development environment.

---

## 📏 Quality Standards

### Code

- **Backend (Python)**:
- Strict static typing (100% Type Hints).
- Linter: `ruff check src/ --fix`.
- Formatter: `ruff format src/`.
- Tests: `pytest` must pass at 100%.
- **Frontend (Tauri/React)**:
- Strict TypeScript (no `any`).
- Linter: `npm run lint`.
- Functional components and Hooks.

### Commits

We use **Conventional Commits**. Your commit message must follow this format:

```text
<type>(<scope>): <short description>

[Optional detailed body]
```

**Allowed Types:**

- `feat`: New functionality.
- `fix`: Bug fix.
- `docs`: Documentation only.
- `refactor`: Code change that doesn't fix bugs or add features.
- `test`: Add or fix tests.
- `chore`: Maintenance, dependencies.

**Example:**

> `feat(whisper): upgrade to faster-whisper 1.0.0 for 20% speedup`

### Documentation (Docs as Code)

If you change functionality, you **must** update documentation in `docs/docs/`.

- Verify that `mkdocs serve` works locally.
- Follow the [Style Guide](style_guide.md).

---

## ✅ Pull Request Checklist

Before submitting your PR:

- [ ] I have run local tests and they pass.
- [ ] I have linted the code (`ruff`, `eslint`).
- [ ] I have updated relevant documentation.
- [ ] I have added an entry to `CHANGELOG.md` (if applicable).
- [ ] My code follows Hexagonal Architecture (no prohibited cross-imports).

!!! tip "Help"
If you have questions about architecture or design, check the documents in `docs/docs/adr/` or ask in the corresponding Issue.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 8 additions & 8 deletions docs/docs/es/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,24 +26,24 @@ La documentación está organizada para servir a diferentes necesidades:

### 🚀 Exploración

- [**Guía Rápida**](guia_rapida.md): Comienza a dictar en minutos.
- [**Glosario**](glosario.md): Define términos clave como _Daemon_, _Whisper_ y _API REST_.
- [**Guía Rápida**](quick_start.md): Comienza a dictar en minutos.
- [**Glosario**](glossary.md): Define términos clave como _Daemon_, _Whisper_ y _API REST_.

### 🛠️ Procedimientos

- [**Instalación**](instalacion.md): Guía paso a paso para Ubuntu/Debian.
- [**Contribución**](contribucion.md): Cómo colaborar en el proyecto.
- [**Instalación**](installation.md): Guía paso a paso para Ubuntu/Debian.
- [**Contribución**](contributing.md): Cómo colaborar en el proyecto.

### ⚙️ Referencia

- [**Configuración**](configuracion.md): Ajusta modelos, dispositivos y comportamientos.
- [**Atajos de Teclado**](atajos_teclado.md): Referencia de comandos globales.
- [**API REST**](referencia_api.md): Documentación de endpoints HTTP y WebSocket.
- [**Configuración**](configuration.md): Ajusta modelos, dispositivos y comportamientos.
- [**Atajos de Teclado**](keyboard_shortcuts.md): Referencia de comandos globales.
- [**API REST**](api_reference.md): Documentación de endpoints HTTP y WebSocket.
- [**API Python**](api/index.md): Referencia de clases y métodos del backend.

### 🧠 Conceptos

- [**Arquitectura**](arquitectura.md): Diseño Hexagonal y componentes del sistema.
- [**Arquitectura**](architecture.md): Diseño Hexagonal y componentes del sistema.
- [**Decisiones (ADR)**](adr/index.md): Registro de decisiones técnicas importantes.

### 🔧 Mantenimiento
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/es/instalacion.md → docs/docs/es/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,5 +105,5 @@ python apps/daemon/backend/scripts/diagnostics/health_check.py

Una vez instalado, es hora de configurar cómo interactúas con la herramienta.

- [Configuración Detallada](configuracion.md) - Ajusta modelos y sensibilidad.
- [Atajos de Teclado](atajos_teclado.md) - Configura tus teclas mágicas.
- [Configuración Detallada](configuration.md) - Ajusta modelos y sensibilidad.
- [Atajos de Teclado](keyboard_shortcuts.md) - Configura tus teclas mágicas.
File renamed without changes.
File renamed without changes.
16 changes: 8 additions & 8 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,17 +186,17 @@ markdown_extensions:
nav:
- Inicio:
- index.md
- Guía Rápida: guia_rapida.md
- Instalación: instalacion.md
- Configuración: configuracion.md
- Glosario: glosario.md
- Guía Rápida: quick_start.md
- Instalación: installation.md
- Configuración: configuration.md
- Glosario: glossary.md

- Guías:
- Atajos de Teclado: atajos_teclado.md
- Atajos de Teclado: keyboard_shortcuts.md
- Solución de Problemas: troubleshooting.md

- Referencia:
- API REST: referencia_api.md
- API REST: api_reference.md
- API Python:
- api/index.md
- Dominios: api/domain.md
Expand All @@ -216,7 +216,7 @@ nav:
# - api/cli/index.md

- Arquitectura:
- arquitectura.md
- architecture.md
- Decisiones (ADR):
- adr/index.md
- Plantilla: adr/template.md
Expand All @@ -228,6 +228,6 @@ nav:
- adr/006-local-first.md

- Comunidad:
- Contribuir: contribucion.md
- Contribuir: contributing.md
- Guía de Estilo: style_guide.md
- Changelog: changelog.md