diff --git a/.github/locales/es/CONTRIBUTING.md b/.github/locales/es/CONTRIBUTING.md index 7b71a0b..8b9e86a 100644 --- a/.github/locales/es/CONTRIBUTING.md +++ b/.github/locales/es/CONTRIBUTING.md @@ -20,7 +20,7 @@ Para mantener nuestros estándares "State of the Art 2026", seguimos reglas estr 2. **Fork & Branch**: - Haz fork del repositorio. - Crea una rama descriptiva: `feat/nuevo-soporte-gpu` o `fix/error-transcripcion`. -3. **Desarrollo Local**: Sigue la guía de [Instalación](instalacion.md) para configurar tu entorno de desarrollo. +3. **Desarrollo Local**: Sigue la guía de [Instalación](installation.md) para configurar tu entorno de desarrollo. --- diff --git a/LEEME.md b/LEEME.md new file mode 100644 index 0000000..16d382f --- /dev/null +++ b/LEEME.md @@ -0,0 +1,44 @@ +# 🗣️ voice2machine + +_dictado por voz para cualquier campo de texto en tu SO_ + +--- + +[🇺🇸 English](README.md) | [🇪🇸 Español](LEEME.md) + +--- + +## qué es esto + +Una herramienta que convierte tu voz a texto utilizando tu GPU local. + +La premisa es simple: hablar es más rápido que escribir. Este proyecto te permite dictar en cualquier aplicación sin depender de servicios en la nube. + +## filosofía + +- **local-first**: tu audio nunca sale de tu máquina +- **modular**: responsabilidades separadas (daemon, api, clientes) +- **impulsado por gpu**: velocidad de transcripción usando WHISPER localmente + +## documentación + +**[📚 Leer la Documentación Completa](https://zarvent.github.io/voice2machine/)** + +Todo lo que necesitas saber está ahí: +* Instalación y Configuración +* Arquitectura y API +* Solución de problemas + +## cómo funciona + +El sistema se ejecuta como un **Demonio en Segundo Plano** que expone una **API REST FastAPI** en `localhost:8765`. + +```mermaid +flowchart LR + A[🎤 Grabar] --> B{Whisper} --> C[📋 Portapapeles] + D[📋 Copiar] --> E{LLM} --> F[📋 Reemplazar] +``` + +## licencia + +Este proyecto está licenciado bajo la **GNU General Public License v3.0** - mira el archivo [LICENSE](LICENSE) para más detalles. diff --git a/README.md b/README.md index f9f5a3b..6d783d1 100644 --- a/README.md +++ b/README.md @@ -4,67 +4,41 @@ _voice dictation for any text field in your OS_ --- +[🇺🇸 English](README.md) | [🇪🇸 Español](LEEME.md) + +--- + ## what is this A tool that converts your voice to text using your local GPU. The premise is simple: speaking is faster than typing. This project allows you to dictate in any application without depending on cloud services. ---- - ## philosophy - **local-first**: your audio never leaves your machine -- **modular**: started as a script, now it's an app with separated responsibilities +- **modular**: separated responsibilities (daemon, api, clients) - **gpu-powered**: transcription speed using WHISPER locally ---- - -## how it works - -The system runs as a **Background Daemon** that exposes a **FastAPI REST API** on `localhost:8765`. - -| component | role | -| ----------- | -------------------------------------------------------------------------------------- | -| `daemon` | Handles audio recording, Whisper transcription, and LLM processing via REST endpoints. | -| `shortcuts` | Global keyboard shortcuts that send HTTP requests to the daemon. | - ---- - ## documentation -All technical info is in `/docs` (consolidated in Spanish): - -- [installation](docs/es/instalacion.md) -- [architecture](docs/es/arquitectura.md) -- [configuration](docs/es/configuracion.md) -- [keyboard shortcuts](docs/es/atajos_teclado.md) ⌨️ -- [troubleshooting](docs/es/troubleshooting.md) - ---- +**[📚 Read the Full Documentation](https://zarvent.github.io/voice2machine/)** -## visual flows +Everything you need to know is there: +* Installation & Setup +* Configuration +* Architecture & API -### voice → text - -```mermaid -flowchart LR -A[🎤 record] --> B{whisper} -B --> C[📋 clipboard] -``` +## how it works -### text → improved text +The system runs as a **Background Daemon** that exposes a **FastAPI REST API** on `localhost:8765`. ```mermaid flowchart LR -A[📋 copy] --> B{LLM} -B --> C[📋 replace] + A[🎤 Record] --> B{Whisper} --> C[📋 Clipboard] + D[📋 Copy] --> E{LLM} --> F[📋 Replace] ``` -> if you don't see the diagrams, you need a mermaid extension - ---- - ## license This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](LICENSE) file for more details. diff --git a/docs/docs/en/changelog.md b/docs/docs/en/changelog.md index 2a07440..9210d9e 100644 --- a/docs/docs/en/changelog.md +++ b/docs/docs/en/changelog.md @@ -1 +1,93 @@ -{% include "../../../CHANGELOG.md" %} +--- +title: Changelog +description: Change log for the Voice2Machine project. +ai_context: "Versions, Change History, SemVer" +depends_on: [] +status: stable +--- + +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [0.3.0] - 2026-01-23 + +### Added + +- **Feature-Based Architecture**: Total restructuring into self-contained modules in `features/` (audio, llm, transcription). +- **Orchestration via Workflows**: Introduction of `RecordingWorkflow` and `LLMWorkflow` to decouple business logic from the monolithic legacy Orchestrator. +- **Strict Protocols**: Implementation of `typing.Protocol` for all internal services, allowing easy swapping of providers. +- **Modular API**: Package structure in `api/` with separate routes and schemas. + +### Changed + +- **Elimination of Orchestrator**: `services/orchestrator.py` has been decomposed and removed. +- **Infrastructure Refactoring**: The `infrastructure/` folder has been integrated into each corresponding `feature`. +- **Core and Domain**: Simplified and moved to `shared/` and local interfaces. + +### Removed + +- **Legacy Audio Tests**: Removal of obsolete tests for the Rust extension. +- **System Monitor**: System telemetry removed for core simplification. + +## [0.2.0] - 2025-01-20 + +### Added + +- **FastAPI REST API**: New HTTP API replacing the Unix Sockets-based IPC system +- **WebSocket streaming**: `/ws/events` endpoint for real-time provisional transcription +- **Swagger documentation**: Interactive UI at `/docs` for testing endpoints +- **Orchestrator pattern**: New coordination pattern that simplifies workflow +- **Rust audio engine**: Native `v2m_engine` extension for low-latency audio capture +- **MkDocs documentation system**: Structured documentation with Material theme + +### Changed + +- **Simplified architecture**: From CQRS/CommandBus to more direct Orchestrator pattern +- **Communication**: From binary Unix Domain Sockets to standard HTTP REST +- **State model**: Centralized management in `DaemonState` with lazy initialization +- Updated README.md with new architecture + +### Removed + +- `daemon.py`: Replaced by `api.py` (FastAPI) +- `client.py`: No longer needed, use `curl` or any HTTP client +- Binary IPC protocol: Replaced by standard JSON + +### Fixed + +- Startup latency: Server starts in ~100ms, model loads in background +- Memory leaks in WebSocket connections + +## [Unreleased] + +### Added + +- **Hallucination Detection**: Heuristic filters and quality parameters (`no_speech`, `compression_ratio`) in `StreamingTranscriber` to reduce erroneous Whisper outputs. +- **Performance Metrics**: Inference latency tracking in logs for detailed diagnostics. + +### Changed + +- **VAD Optimization**: Adjusted default threshold to 0.4 to reduce false positives from ambient noise and breathing. +- **Memory Management**: Forced CUDA cache reset (`torch.cuda.empty_cache()`) when unloading models to effectively free VRAM. +- **Code Hygiene**: Import refactoring and linting error fixes (`ruff`) throughout the backend codebase. + +### Planned + +- Support for multiple simultaneous transcription languages +- Web dashboard for real-time monitoring +- Integration with more LLM providers + +## [0.1.0] - 2024-03-20 + +### Added + +- **Initial Voice2Machine system version** +- Local transcription support with Whisper (faster-whisper) +- Basic LLM integration (Ollama/Gemini) +- Unix Domain Sockets-based IPC system +- Hexagonal architecture with ports and adapters +- TOML-based configuration diff --git a/docs/docs/en/contributing.md b/docs/docs/en/contributing.md index 40d3fbf..232d971 100644 --- a/docs/docs/en/contributing.md +++ b/docs/docs/en/contributing.md @@ -1 +1,84 @@ -{% include "../../../.github/CONTRIBUTING.md" %} +--- +title: Contributing Guide +description: Instructions and standards for collaborating on Voice2Machine. +status: stable +last_update: 2026-01-23 +language: US English +--- + +# ❤️ Contributing Guide + +Thank you for your interest in contributing to **Voice2Machine**! This project is built on collaboration and quality code. + +To maintain our "State of the Art 2026" standards, we follow strict but fair rules. Please read this before submitting your first Pull Request. + +--- + +## 🚀 Workflow + +1. **Discussion First**: Before writing code, open an [Issue](https://github.com/v2m-lab/voice2machine/issues) to discuss the change. This avoids duplicate work or rejections due to architectural misalignment. +2. **Fork & Branch**: + - Fork the repository. + - Create a descriptive branch: `feat/new-gpu-support` or `fix/transcription-error`. +3. **Local Development**: Follow the [Installation](installation.md) guide to set up your development environment. + +--- + +## 📏 Quality Standards + +### Code + +- **Backend (Python)**: + - Strict static typing (100% Type Hints). + - Linter: `ruff check src/ --fix`. + - Formatter: `ruff format src/`. + - Tests: `pytest` must pass at 100%. +- **Frontend (Tauri/React)**: + - Strict TypeScript (no `any`). + - Linter: `npm run lint`. + - Functional components and Hooks. + +### Commits + +We use **Conventional Commits**. Your commit message must follow this format: + +```text +(): + +[Optional detailed body] +``` + +**Allowed Types:** + +- `feat`: New functionality. +- `fix`: Bug fix. +- `docs`: Documentation only. +- `refactor`: Code change that doesn't fix bugs or add features. +- `test`: Add or fix tests. +- `chore`: Maintenance, dependencies. + +**Example:** + +> `feat(whisper): upgrade to faster-whisper 1.0.0 for 20% speedup` + +### Documentation (Docs as Code) + +If you change functionality, you **must** update documentation in `docs/docs/`. + +- Verify that `mkdocs serve` works locally. +- Follow the [Style Guide](style_guide.md). + +--- + +## ✅ Pull Request Checklist + +Before submitting your PR: + +- [ ] I have run local tests and they pass. +- [ ] I have linted the code (`ruff`, `eslint`). +- [ ] I have updated relevant documentation. +- [ ] I have added an entry to `CHANGELOG.md` (if applicable). +- [ ] My code follows Hexagonal Architecture (no prohibited cross-imports). + +!!! tip "Help" +If you have questions about architecture or design, check the documents in `docs/docs/adr/` or ask in the corresponding Issue. diff --git a/docs/docs/es/referencia_api.md b/docs/docs/es/api_reference.md similarity index 100% rename from docs/docs/es/referencia_api.md rename to docs/docs/es/api_reference.md diff --git a/docs/docs/es/arquitectura.md b/docs/docs/es/architecture.md similarity index 100% rename from docs/docs/es/arquitectura.md rename to docs/docs/es/architecture.md diff --git a/docs/docs/es/configuracion.md b/docs/docs/es/configuration.md similarity index 100% rename from docs/docs/es/configuracion.md rename to docs/docs/es/configuration.md diff --git a/docs/docs/es/contribucion.md b/docs/docs/es/contributing.md similarity index 100% rename from docs/docs/es/contribucion.md rename to docs/docs/es/contributing.md diff --git a/docs/docs/es/glosario.md b/docs/docs/es/glossary.md similarity index 100% rename from docs/docs/es/glosario.md rename to docs/docs/es/glossary.md diff --git a/docs/docs/es/index.md b/docs/docs/es/index.md index 3c022b2..f66fcdf 100644 --- a/docs/docs/es/index.md +++ b/docs/docs/es/index.md @@ -26,24 +26,24 @@ La documentación está organizada para servir a diferentes necesidades: ### 🚀 Exploración -- [**Guía Rápida**](guia_rapida.md): Comienza a dictar en minutos. -- [**Glosario**](glosario.md): Define términos clave como _Daemon_, _Whisper_ y _API REST_. +- [**Guía Rápida**](quick_start.md): Comienza a dictar en minutos. +- [**Glosario**](glossary.md): Define términos clave como _Daemon_, _Whisper_ y _API REST_. ### 🛠️ Procedimientos -- [**Instalación**](instalacion.md): Guía paso a paso para Ubuntu/Debian. -- [**Contribución**](contribucion.md): Cómo colaborar en el proyecto. +- [**Instalación**](installation.md): Guía paso a paso para Ubuntu/Debian. +- [**Contribución**](contributing.md): Cómo colaborar en el proyecto. ### ⚙️ Referencia -- [**Configuración**](configuracion.md): Ajusta modelos, dispositivos y comportamientos. -- [**Atajos de Teclado**](atajos_teclado.md): Referencia de comandos globales. -- [**API REST**](referencia_api.md): Documentación de endpoints HTTP y WebSocket. +- [**Configuración**](configuration.md): Ajusta modelos, dispositivos y comportamientos. +- [**Atajos de Teclado**](keyboard_shortcuts.md): Referencia de comandos globales. +- [**API REST**](api_reference.md): Documentación de endpoints HTTP y WebSocket. - [**API Python**](api/index.md): Referencia de clases y métodos del backend. ### 🧠 Conceptos -- [**Arquitectura**](arquitectura.md): Diseño Hexagonal y componentes del sistema. +- [**Arquitectura**](architecture.md): Diseño Hexagonal y componentes del sistema. - [**Decisiones (ADR)**](adr/index.md): Registro de decisiones técnicas importantes. ### 🔧 Mantenimiento diff --git a/docs/docs/es/instalacion.md b/docs/docs/es/installation.md similarity index 95% rename from docs/docs/es/instalacion.md rename to docs/docs/es/installation.md index 9cb20f2..307b151 100644 --- a/docs/docs/es/instalacion.md +++ b/docs/docs/es/installation.md @@ -105,5 +105,5 @@ python apps/daemon/backend/scripts/diagnostics/health_check.py Una vez instalado, es hora de configurar cómo interactúas con la herramienta. -- [Configuración Detallada](configuracion.md) - Ajusta modelos y sensibilidad. -- [Atajos de Teclado](atajos_teclado.md) - Configura tus teclas mágicas. +- [Configuración Detallada](configuration.md) - Ajusta modelos y sensibilidad. +- [Atajos de Teclado](keyboard_shortcuts.md) - Configura tus teclas mágicas. diff --git a/docs/docs/es/atajos_teclado.md b/docs/docs/es/keyboard_shortcuts.md similarity index 100% rename from docs/docs/es/atajos_teclado.md rename to docs/docs/es/keyboard_shortcuts.md diff --git a/docs/docs/es/guia_rapida.md b/docs/docs/es/quick_start.md similarity index 100% rename from docs/docs/es/guia_rapida.md rename to docs/docs/es/quick_start.md diff --git a/mkdocs.yml b/mkdocs.yml index ac6d8e3..f637eca 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -186,17 +186,17 @@ markdown_extensions: nav: - Inicio: - index.md - - Guía Rápida: guia_rapida.md - - Instalación: instalacion.md - - Configuración: configuracion.md - - Glosario: glosario.md + - Guía Rápida: quick_start.md + - Instalación: installation.md + - Configuración: configuration.md + - Glosario: glossary.md - Guías: - - Atajos de Teclado: atajos_teclado.md + - Atajos de Teclado: keyboard_shortcuts.md - Solución de Problemas: troubleshooting.md - Referencia: - - API REST: referencia_api.md + - API REST: api_reference.md - API Python: - api/index.md - Dominios: api/domain.md @@ -216,7 +216,7 @@ nav: # - api/cli/index.md - Arquitectura: - - arquitectura.md + - architecture.md - Decisiones (ADR): - adr/index.md - Plantilla: adr/template.md @@ -228,6 +228,6 @@ nav: - adr/006-local-first.md - Comunidad: - - Contribuir: contribucion.md + - Contribuir: contributing.md - Guía de Estilo: style_guide.md - Changelog: changelog.md