Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
## 📚 Documentation Governance (SOTA 2026)

### Docs as Code
* **Source of Truth**: Technical documentation lives in `docs/docs/es/`. `mkdocs.yml` defines the site structure.
* **Sync**: Any PR that changes functionality (code) **MUST** include the corresponding update in the documentation.
* **Source of Truth**: Technical documentation lives in `docs/docs/es/` (Spanish) and `docs/docs/en/` (English). `mkdocs.yml` defines the site structure.
* **Sync**: Any PR that changes functionality (code) **MUST** include the corresponding update in the documentation for BOTH languages.
* **README**: `README.md` (English) and `LEEME.md` (Spanish) must stay synchronized and point to the detailed documentation.

### Quality Standards
Expand All @@ -23,7 +23,7 @@
* **Reference**: APIs, Configuration, Commands.
* **Concepts**: Architecture, design decisions (ADRs).
3. **Language**:
* Detailed documentation (`docs/`): **Native Latin American Spanish**.
* Detailed documentation (`docs/`): **Bilingual (English `docs/docs/en` and Native Latin American Spanish `docs/docs/es`)**.
* Code comments: **Native Latin American Spanish**.
* Commits: English (Conventional Commits).

Expand Down Expand Up @@ -99,4 +99,4 @@ When generating code:
- Prefer **Pydantic V2** for data validation.
- Use robust error handling (`ApplicationError` hierarchy).
- Assume a **CUDA 12** context for GPU operations.
- **Language**: All documentation and comments must be in Native Latin American Spanish.
- **Language**: Documentation must be bilingual (English/Spanish). Comments must be in Native Latin American Spanish.
68 changes: 68 additions & 0 deletions LEEME.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# 🗣️ Voice2Machine (V2M)

> **Dictado por Voz Local y Refinado de Texto**
> *State of the Art 2026 - Privacidad Primero - Acelerado por GPU*

---

## 🚀 Resumen

**Voice2Machine** te permite dictar texto en **cualquier aplicación** de tu sistema operativo. Utiliza tu GPU local para transcribir audio con la máxima velocidad y precisión, asegurando que tus datos nunca salgan de tu máquina.

* **Dictado**: Voz → Texto (Whisper)
* **Refinado**: Texto → IA → Texto Mejorado (LLM)

---

## 📚 Documentación

Mantenemos documentación completa en Inglés y Español.

* 🇪🇸 **[Documentación en Español](docs/docs/es/index.md)**
* 🇺🇸 **[English Documentation](docs/docs/en/index.md)**

### Enlaces Rápidos

| Tema | Español | Inglés |
| :--- | :--- | :--- |
| **Empieza Aquí** | [Guía Rápida](docs/docs/es/guia_rapida.md) | [Quick Start](docs/docs/en/quick_start.md) |
| **Instalación** | [Instalación](docs/docs/es/instalacion.md) | [Installation](docs/docs/en/installation.md) |
| **Configuración** | [Configuración](docs/docs/es/configuracion.md) | [Configuration](docs/docs/en/configuration.md) |
| **Diseño** | [Arquitectura](docs/docs/es/arquitectura.md) | [Architecture](docs/docs/en/architecture.md) |

---

## ⚡ Inicio Rápido

### Instalación

```bash
# Clonar e instalar (Ubuntu/Debian)
git clone https://github.com/v2m-lab/voice2machine.git
cd voice2machine
./scripts/install.sh
```

### Uso

1. **Iniciar el Demonio**: `python -m v2m.main --daemon`
2. **Alternar Grabación**: Ejecuta `scripts/v2m-toggle.sh` (Vincula esto a una tecla como `Super+V`).

---

## 🧩 Arquitectura

Voice2Machine sigue una **Arquitectura Hexagonal** con una estricta separación entre el Backend en Python (Lógica central) y el Frontend en Tauri (GUI).

```mermaid
graph TD
Frontend[Tauri Frontend] <-->|IPC Unix Socket| Daemon[Python Daemon]
Daemon --> Whisper[Whisper Local]
Daemon --> LLM[LLM Local/Nube]
```

---

## 📄 Licencia

Este proyecto está licenciado bajo la **GNU General Public License v3.0**.
85 changes: 38 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,68 @@
# 🗣️ voice2machine (v2m-lab)
# 🗣️ Voice2Machine (V2M)

Internal source of truth and upstream core for Voice2Machine.

_voice dictation for any text field in your OS_
> **Local Voice Dictation & Text Refinement**
> *State of the Art 2026 - Privacy First - GPU Accelerated*

---

## 🚀 Exploration

### What is this?
A tool that converts your voice to text using your local GPU. The premise is simple: speaking is faster than typing. This project allows you to dictate in any application without depending on cloud services.
## 🚀 Overview

### Why use it?
- **Privacy**: Local-first philosophy. Your audio never leaves your machine.
- **Speed**: GPU-accelerated transcription (Whisper) for near real-time performance.
- **Flexibility**: Works with any OS text field via clipboard injection.
**Voice2Machine** allows you to dictate text into **any application** on your operating system. It uses your local GPU to transcribe audio with maximum speed and accuracy, ensuring your data never leaves your machine.

### For Whom?
- **Developers**: Automate documentation and coding via voice.
- **Writers**: Draft content at the speed of thought.
- **Privacy Advocates**: Use AI without surveillance capitalism.
* **Dictation**: Voice → Text (Whisper)
* **Refinement**: Text → AI → Better Text (LLM)

---

## ⚡ Quick Start
## 📚 Documentation

### Installation
See the [Installation Guide](docs/docs/es/instalacion.md) for detailed steps on Ubuntu/Debian.
We maintain comprehensive documentation in both English and Spanish.

### Usage
Two global keyboard shortcuts control the flow:
* 🇺🇸 **[English Documentation](docs/docs/en/index.md)**
* 🇪🇸 **[Documentación en Español](docs/docs/es/index.md)**

### Quick Links

| Script | Function |
| :--- | :--- |
| `v2m-toggle.sh` | **Record** → **Transcribe** → **Paste** (via clipboard) |
| `v2m-llm.sh` | **Copy** → **Refine** (LLM) → **Replace** |
| Topic | English | Español |
| :--- | :--- | :--- |
| **Start Here** | [Quick Start](docs/docs/en/quick_start.md) | [Guía Rápida](docs/docs/es/guia_rapida.md) |
| **Setup** | [Installation](docs/docs/en/installation.md) | [Instalación](docs/docs/es/instalacion.md) |
| **Config** | [Configuration](docs/docs/en/configuration.md) | [Configuración](docs/docs/es/configuracion.md) |
| **Design** | [Architecture](docs/docs/en/architecture.md) | [Arquitectura](docs/docs/es/arquitectura.md) |

---

## 📚 Documentation
## ⚡ Quick Start

Detailed technical documentation is consolidated in the `docs/` directory (in Spanish) and can be served locally with `mkdocs serve`.
### Installation

- [**Installation**](docs/docs/es/instalacion.md): Setup guide.
- [**Architecture**](docs/docs/es/arquitectura.md): System design.
- [**Configuration**](docs/docs/es/configuracion.md): Tweak parameters.
- [**Keyboard Shortcuts**](docs/docs/es/atajos_teclado.md): Control reference.
- [**Troubleshooting**](docs/docs/es/troubleshooting.md): Fix common issues.
```bash
# Clone and install (Ubuntu/Debian)
git clone https://github.com/v2m-lab/voice2machine.git
cd voice2machine
./scripts/install.sh
```

---
### Usage

## 🧩 Visual Flows
1. **Start the Daemon**: `python -m v2m.main --daemon`
2. **Toggle Recording**: Run `scripts/v2m-toggle.sh` (Bind this to a key like `Super+V`).

### Voice to Text (Standard)
---

```mermaid
flowchart LR
A[🎤 Record] --> B{Whisper Local}
B --> C[📋 Clipboard]
```
## 🧩 Architecture

### Text to Refined Text (LLM)
Voice2Machine follows a **Hexagonal Architecture** with a strict separation between the Python Backend (Core logic) and the Tauri Frontend (GUI).

```mermaid
flowchart LR
A[📋 Copy Text] --> B{Local LLM}
B --> C[📋 Replace Text]
graph TD
Frontend[Tauri Frontend] <-->|IPC Unix Socket| Daemon[Python Daemon]
Daemon --> Whisper[Local Whisper]
Daemon --> LLM[Local/Cloud LLM]
```

> *Note: Diagrams require a Mermaid-compatible viewer.*

---

## 📄 License

This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](LICENSE) file for more details.
This project is licensed under the **GNU General Public License v3.0**.
13 changes: 13 additions & 0 deletions docs/docs/en/adr/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Architecture Decision Records (ADRs)

An Architecture Decision Record (ADR) is a document that captures an important architectural decision, along with its context and consequences.

## Decision Index

*Currently, there are no formal decisions recorded in this format. New decisions will be added here.*

## When to write an ADR?

Write an ADR when you make a significant decision that affects the project's structure, dependencies, interfaces, or technology.

See [ADR Template](template.md) for the format.
26 changes: 26 additions & 0 deletions docs/docs/en/adr/template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# ADR-XXX: Short Title of Decision

## Status
[Proposed | Accepted | Rejected | Deprecated]

## Context
Describe the context and the problem we are solving.
- What is the current limitation?
- What technical or business requirements drive this?

## Decision
Describe the decision made.
- "We will use X technology for Y component..."

## Consequences
What becomes easier or harder due to this change?

### Positive
-

### Negative
-

## Alternatives Considered
- Option A: Why it was rejected.
- Option B: Why it was rejected.
68 changes: 68 additions & 0 deletions docs/docs/en/api_reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# API Reference (IPC)

This section documents the internal communication protocol between the Frontend (Client) and the Daemon (Server).

!!! info "Architecture Note"
Voice2Machine uses a Unix socket-based architecture for low-latency local communication. It is not a public REST API.

## Message Protocol

All messages (Requests and Responses) follow this binary format:

1. **Header (4 bytes)**: Big-endian unsigned integer (`>I`) indicating payload size in bytes.
2. **Payload (N bytes)**: JSON object encoded in UTF-8.

### Limits

- `MAX_REQUEST_SIZE`: 10 MB
- `MAX_RESPONSE_SIZE`: 10 MB

## Command Structure (Request)

The JSON payload must have the following structure:

```json
{
"command": "command_name",
"payload": {
// command-specific arguments
}
}
```

### Common Commands

#### `start_recording`
Starts audio recording.
- **Payload**: `{}`

#### `stop_recording`
Stops recording and triggers transcription.
- **Payload**: `{}`

#### `get_config`
Retrieves current configuration.
- **Payload**: `{}`

#### `update_config`
Updates configuration values.
- **Payload**: Partial configuration object (e.g., `{"transcription": {"model": "distil-large-v3"}}`).

## Response Structure (Response)

The response JSON payload always includes a `state` field for synchronization with the Frontend.

```json
{
"status": "success" | "error",
"data": {
// requested data or null
},
"error": "optional error message",
"state": {
"is_recording": boolean,
"is_transcribing": boolean,
// ... other system states
}
}
```
Loading