Refactor: 2026 Rebirth - Rust + Tauri v2 Architecture by cesarszv · Pull Request #105 · zarvent/voice2machine

cesarszv · 2026-01-27T18:20:18Z

Informe de Refactorización: Voice2Machine 2026 Rebirth

Este documento describe la refactorización mayor de voice2machine. Es un informe sobre el por qué de las decisiones arquitectónicas y el estado actual del proyecto.

Resumen

capture es una utilidad de voz a texto que se ejecuta completamente en local.

Flujo: Presiona atajo (Ctrl+Shift+Space) → Habla → El texto aparece en tu portapapeles.

Diseñada para velocidad, privacidad y simplicidad. Sin servidores, sin suscripciones, sin complejidad innecesaria.

Contexto y Motivación

El objetivo es simple: eliminar la fricción entre pensar algo y tenerlo escrito.

Metas del diseño

Velocidad: Respuesta < 1 segundo (con hardware adecuado).
Independencia: Cero dependencia de internet.
Privacidad: Sin tracking ni telemetría.
UX: Interfaz mínima, casi invisible.

Evolución del Proyecto

Fase 1: Scripts de Python

El primer intento. Scripts llamando a Whisper desde la terminal.

Aprendizaje: Whisper local es viable.
Problema: Fricción excesiva. Nadie quiere abrir una terminal para dictar una frase.

Fase 2: Arquitectura CQRS

Intento de "arquitectura empresarial". Separación de comandos, queries, event sourcing.

Problema: Over-engineering masivo. La complejidad mató la velocidad de desarrollo sin aportar valor real al usuario final.

Fase 3: API con FastAPI

Simplificación a servidor REST.

Mejora: Más pragmático que CQRS.
Problema: Python no es ideal para distribución desktop (latencia GC, problemas de empaquetado, GIL).

Fase 4: Refactor (2026-01-26 - Estado Actual)

Reinicio total ("Rebirth"). Borrón y cuenta nueva.

Cambio fundamental: Resolver el problema específico (dictado local rápido) en lugar de construir una plataforma genérica.

Decisiones de Arquitectura

Principio Rector: Local-First, no Local-Only

Local-First: Procesamiento en tu máquina por defecto. Privacidad garantizada.
Extensibilidad: La arquitectura permite futuros proveedores cloud (opcional), pero nunca como requisito.

Tech Stack: Rust + Tauri

Evaluación de alternativas:

Electron: Demasiado pesado (~200MB+ RAM/Disco) para una utilidad de fondo.
Python: Problemas de empaquetado y latencia (GIL/GC).
Rust + Tauri: Ganador. Binarios ligeros (~10MB), latencia predecible, concurrencia real, UI moderna con React.

Motor de IA: Whisper.cpp + VAD

Whisper.cpp (whisper-rs): Inferencia local optimizada. Modelo large-v3-turbo (~3GB) descargado una vez. Sin costos por minuto, sin latencia de red, privacidad total.
VAD (Voice Activity Detection): Separamos la detección de voz de la transcripción.
1. Silero VAD detecta voz/silencio en ms.
2. Solo el audio con voz se envía a Whisper.
- Resultado: Reducción drástica de latencia y procesamiento inútil.

Filosofía de Hardware

Compilación condicional: Soporte nativo para aceleradores, con fallback grácil a CPU.
Linux: Prioridad a Vulkan sobre CUDA para portabilidad (AMD/Intel/Nvidia) sin drivers propietarios cerrados.

Alcance y Limitaciones

Lo que capture ES

Utilidad de propósito único: Voz → Portapapeles.
Atajo global.
Feedback visual mínimo.

Lo que capture NO ES

No es un asistente (Siri/Alexa).
No es para transcripción de reuniones largas (batch).
No es un editor de audio.

Estado Actual (MVP)

✅ Atajo global (Ctrl+Shift+Space).
✅ Grabación con feedback visual.
✅ VAD integrado.
✅ Transcripción local (Whisper large-v3-turbo).
✅ Copia automática al clipboard.

Futuro

Soporte opcional para APIs Cloud.
Más idiomas.
Configuración de atajos.

Lecciones Aprendidas

Simplicidad: La solución más simple suele ser la correcta. La complejidad prematura (CQRS) es costosa.
Herramientas: El ecosistema de Rust (Tauri, whisper-rs, cpal) está maduro para aplicaciones de escritorio de alto rendimiento.
Iteración: Los fallos previos fueron necesarios para entender qué no construir.

…e app

Backend (Rust - src-tauri/src/) | File | Description | |------|-------------| | pipeline/mod.rs | Module exports for pipeline | | pipeline/orchestrator.rs | Main orchestrator connecting audio → VAD → Whisper → clipboard | | output/mod.rs | Module exports for output | | output/clipboard.rs | arboard wrapper for clipboard operations | | tray/mod.rs | Module exports for tray | | tray/icon.rs | Dynamic tray icon management with state-based updates | | lib.rs | Main library with Tauri IPC commands and setup | | main.rs | Tauri application entry point | | transcription/model.rs | Added ModelDownloader struct | Frontend (React/TypeScript - src/) | File | Description | |------|-------------| | types/index.ts | TypeScript types for IPC communication | | hooks/useDownload.ts | Hook for model download with progress | | hooks/useConfig.ts | Hook for app configuration management | | hooks/useRecording.ts | Hook for recording state and transcription | | hooks/index.ts | Hook exports | | components/SetupWizard.tsx | Model download wizard UI | | components/SetupWizard.css | Styles for wizard | | components/SettingsPanel.tsx | Settings/control panel UI | | components/SettingsPanel.css | Styles for settings | | App.tsx | Fixed IPC command name | Icons (src-tauri/icons/) | File | Description | |------|-------------| | icon-idle.png | Gray microphone (idle state) | | icon-recording.png | Red microphone (recording state) | | icon-processing.png | Yellow microphone (processing state) | | icon.png | 512x512 app icon | | 32x32.png, 128x128.png, 128x128@2x.png | Various size app icons | Configuration Updates | File | Change | |------|--------| | rust-toolchain.toml | Updated to Rust 1.85 | | Cargo.toml | Updated rust-version | IPC Commands Available: toggle_recording - Start/stop recording get_state - Get current recording state list_audio_devices - List available input devices get_config - Get app configuration set_config - Update configuration is_model_downloaded - Check if model exists get_model_info - Get model information download_model - Download Whisper model cancel_download - Cancel download load_model - Load model into memory is_model_loaded - Check if model is loaded

…mands

…nings - Remove empty plugin configs from tauri.conf.json (dialog, shell, etc. require unit type, not empty object) - Remove unused tray icon ID constants - Add #[allow(dead_code)] for pre_speech_samples field kept for debugging

…kfile versioning Summary: Major structural refactor to align with 2026 engineering standards for reproducibility and cleanliness. This commit consolidates assets, enforces lockfile tracking, and centralizes documentation. Details: - **Reproducibility**: Un-ignored `Cargo.lock` and `pnpm-lock.yaml` in `.gitignore`. This is critical for deterministic builds across CI/CD and development environments. - **Asset Consolidation**: Moved all static resources from `src-tauri/{icons,sounds}` to a unified `src-tauri/assets/` directory. - Updated `tauri.conf.json` to reference new icon paths. - Refactored `icon.rs` and `playback.rs` to load assets from the new location using `include_bytes!` and runtime paths. - **Documentation**: Established `docs/adr/` for Architecture Decision Records. Moved 'MVP birth plan' to `docs/adr/0001-mvp-birth-plan.md`. - **Project Hygiene**: - Added `clean`, `clean:rust`, and `clean:all` scripts to `package.json`. - Updated root `.gitignore` to properly filter build artifacts (`target/`, `dist/`, `node_modules/`) while respecting project-level configs. - Enhanced VSCode workspace configuration with advanced file nesting patterns for reduced visual noise. - **Monorepo Structure**: Migrated root-level configuration files (`Cargo.toml`, `README.md`, etc.) into `apps/capture` scope where appropriate, reflecting the focus on the Capture app.

…mentation

… documentation

- Translated and expanded AGENTS.md to provide a comprehensive overview of the capture app, including architecture, principles, and key modules. - Updated README.md to clarify the purpose and functionality of the capture app, emphasizing local-first design and user privacy. - Enhanced docs/README.md with a structured index and detailed explanations of Architecture Decision Records (ADRs). - Revised src-tauri/README.md to outline the backend architecture and its components, highlighting the use of Rust for performance. - Improved src/README.md to detail the frontend structure, including components, hooks, and types, while emphasizing the separation of concerns between frontend and backend. - Added a new top-level README.md for the voice2machine monorepo, summarizing its purpose, philosophy, and technical stack. - Updated voice2machine.code-workspace to include a new path for v2m-skills.

…remove obsolete web design guidelines.

…ctor button state logic

Copilot

Pull request overview

Major “Rebirth” refactor that retires the legacy Python daemon toolchain and introduces a new Tauri v2 + Rust architecture (with a React/TypeScript UI) for local-first voice dictation.

Changes:

Removed legacy Python backend scripts, configs, prompts, and repo meta docs tied to the old daemon.
Added new apps/capture Tauri v2 application: React UI + Rust backend modules (audio, VAD, transcription, tray, clipboard).
Updated repo-level docs and VS Code tasks to match the new workflow.

Reviewed changes

Copilot reviewed 190 out of 363 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
apps/daemon/backend/scripts/operations/daemon/stop_daemon.sh	Removed legacy daemon stop script
apps/daemon/backend/scripts/operations/daemon/start_daemon.sh	Removed legacy daemon start script
apps/daemon/backend/scripts/operations/daemon/restart_daemon.sh	Removed legacy daemon restart script
apps/daemon/backend/scripts/operations/client/v2m-process.sh	Removed legacy clipboard→LLM script
apps/daemon/backend/scripts/operations/client/v2m-gemini.sh	Removed legacy Gemini script
apps/daemon/backend/scripts/diagnostics/verify_export_backend.py	Removed legacy IPC verification tool
apps/daemon/backend/scripts/diagnostics/check_cuda.py	Removed legacy CUDA diagnostic
apps/daemon/backend/scripts/development/testing/test_whisper_standalone.py	Removed legacy Whisper pipeline test
apps/daemon/backend/scripts/development/testing/test_whisper_gpu.py	Removed legacy GPU load test
apps/daemon/backend/scripts/development/testing/test_clipboard.py	Removed legacy clipboard test
apps/daemon/backend/scripts/development/testing/check_clipboard.py	Removed legacy clipboard diagnostics
apps/daemon/backend/scripts/development/maintenance/repair_libs.sh	Removed legacy NVIDIA libs repair script
apps/daemon/backend/scripts/development/maintenance/cleanup_v2m.sh	Removed legacy cleanup script
apps/daemon/backend/scripts/development/create-pr.sh	Removed legacy PR helper
apps/daemon/backend/scripts/README.md	Removed legacy scripts docs
apps/daemon/backend/resources/prompts/refine_system.txt	Removed legacy LLM prompt
apps/daemon/backend/resources/prompts/README.md	Removed prompts folder docs
apps/daemon/backend/resources/models/.gitkeep	Removed models placeholder
apps/daemon/backend/requirements.txt	Removed Python deps locklist
apps/daemon/backend/requirements-minimal.txt	Removed minimal Python deps
apps/daemon/backend/pyproject.toml	Removed Python project config
apps/daemon/backend/config.toml	Removed legacy backend config
apps/daemon/backend/README.md	Removed legacy backend docs
apps/daemon/backend/.opencode/skills/backend-commit	Removed legacy agent skill link
apps/daemon/backend/.github/skills/backend-commit	Removed legacy agent skill link
apps/daemon/backend/.gemini/skills/backend-commit	Removed legacy agent skill link
apps/daemon/backend/.gemini/settings.json	Removed legacy Gemini settings
apps/daemon/backend/.gemini/.gitignore	Removed legacy Gemini ignore
apps/daemon/backend/.cursor/skills/backend-commit	Removed legacy agent skill link
apps/daemon/backend/.codex/skills/backend-commit	Removed legacy agent skill link
apps/daemon/backend/.agents/skills/backend-commit/SKILL.md	Removed backend commit workflow doc
apps/daemon/backend/.agent/skills/backend-commit	Removed legacy agent skill link
apps/daemon/backend/.agent/automated tasks/update agents and readme in backend/TASK.md	Removed agent automated task
apps/daemon/backend/.agent/automated tasks/README.md	Removed agent automated tasks index
apps/daemon/backend/.agent/automated tasks/04_dependency_management/task.md	Removed dependency maintenance task
apps/daemon/backend/.agent/automated tasks/03_environment_maintenance/task.md	Removed env maintenance task
apps/daemon/backend/.agent/automated tasks/02_test_verification/task.md	Removed test verification task
apps/daemon/backend/.agent/automated tasks/01_code_quality/task.md	Removed code quality task
apps/capture/vite.config.ts	Added Vite config for Tauri dev server
apps/capture/tsconfig.node.json	Added TS config for tooling files
apps/capture/tsconfig.json	Added strict TS config for UI
apps/capture/src/types/index.ts	Added typed IPC/UI contracts
apps/capture/src/main.tsx	Added React entrypoint
apps/capture/src/lib/tauri.ts	Added typed wrappers for Tauri API
apps/capture/src/index.css	Added base UI styles
apps/capture/src/hooks/useRecording.ts	Added recording state/event hook
apps/capture/src/hooks/useLatest.ts	Added “latest ref” hook utility
apps/capture/src/hooks/useDownload.ts	Added model download hook
apps/capture/src/hooks/useConfig.ts	Added config/devices hook
apps/capture/src/hooks/index.ts	Added hooks barrel exports
apps/capture/src/components/SetupWizard.tsx	Added first-run model download wizard
apps/capture/src/components/SetupWizard.css	Added setup wizard styles
apps/capture/src/README.md	Added frontend structure docs
apps/capture/src/App.tsx	Added app boot flow (wizard vs settings)
apps/capture/src/App.css	Added app shared styles
apps/capture/src-tauri/tauri.conf.json	Added Tauri v2 app configuration
apps/capture/src-tauri/src/vad/mod.rs	Added VAD module root
apps/capture/src-tauri/src/tray/mod.rs	Added tray module root
apps/capture/src-tauri/src/transcription/mod.rs	Added transcription module root
apps/capture/src-tauri/src/pipeline/mod.rs	Added pipeline module root
apps/capture/src-tauri/src/output/mod.rs	Added output module root
apps/capture/src-tauri/src/output/clipboard.rs	Added clipboard integration
apps/capture/src-tauri/src/lib.rs	Added Rust app state + setup
apps/capture/src-tauri/src/config/mod.rs	Added config module root
apps/capture/src-tauri/src/audio/playback.rs	Added sound cue playback
apps/capture/src-tauri/src/audio/mod.rs	Added audio module root
apps/capture/src-tauri/src/audio/devices.rs	Added audio device enumeration
apps/capture/src-tauri/capabilities/default.json	Added Tauri capabilities permissions
apps/capture/src-tauri/build.rs	Added Tauri build script
apps/capture/src-tauri/README.md	Added Rust backend docs
apps/capture/src-tauri/Cargo.toml	Added Rust deps/features for Capture
apps/capture/skills/vercel-react-best-practices	Added local skill link
apps/capture/rust-toolchain.toml	Pinned Rust toolchain for Capture
apps/capture/pnpm-workspace.yaml	Added per-app pnpm workspace
apps/capture/package.json	Added Capture JS deps/scripts
apps/capture/index.html	Added Vite HTML entry
apps/capture/docs/README.md	Added Capture docs index
apps/capture/README.md	Added Capture app overview
apps/capture/Cargo.toml	Added Rust workspace for Capture
apps/capture/.opencode/skills/web-design-guidelines	Added local skill link
apps/capture/.opencode/skills/vercel-react-best-practices	Added local skill link
apps/capture/.gitignore	Added app-level ignore rules
apps/capture/.github/skills/web-design-guidelines	Added local skill link
apps/capture/.github/skills/vercel-react-best-practices	Added local skill link
apps/capture/.cursor/skills/web-design-guidelines	Added local skill link
apps/capture/.cursor/skills/vercel-react-best-practices	Added local skill link
apps/capture/.claude/skills/vercel-react-best-practices	Added local skill link
apps/capture/.agents/skills/web-design-guidelines/SKILL.md	Added web UI review skill doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-serialization.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-parallel-fetching.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-dedup-props.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-cache-react.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-cache-lru.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-auth-actions.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-after-nonblocking.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-use-ref-transient-values.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-transitions.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-simple-expression-in-memo.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-move-effect-to-event.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-memo.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-memo-with-default-value.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practactices/rules/rerender-lazy-state-init.md	Added guideline rule doc (path note: verify spelling)
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-functional-setstate.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-derived-state.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-derived-state-no-effect.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-dependencies.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-defer-reads.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-usetransition-loading.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-svg-precision.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-hydration-suppress-warning.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-hydration-no-flicker.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-hoist-jsx.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-content-visibility.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-conditional-render.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-animate-svg-wrapper.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-activity.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-tosorted-immutable.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-set-map-lookups.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-min-max-loop.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-length-check-first.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-index-maps.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-hoist-regexp.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-early-exit.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-combine-iterations.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-cache-storage.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-cache-property-access.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-cache-function-results.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-batch-dom-css.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-swr-dedup.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-passive-event-listeners.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-localstorage-schema.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-event-listeners.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-preload.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-dynamic-imports.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-defer-third-party.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-conditional.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-barrel-imports.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-suspense-boundaries.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-parallel.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-dependencies.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-defer-await.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-api-routes.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/advanced-use-latest.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/advanced-init-once.md	Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/advanced-event-handler-refs.md	Added guideline rule doc
apps/capture/.agent/skills/web-design-guidelines	Added local skill link
apps/capture/.agent/skills/vercel-react-best-practices	Added local skill link
README.md	Updated repo overview to new architecture
CHANGELOG.md	Removed legacy changelog
.vscode/tasks.json	Updated tasks for Capture workflow
.github/locales/es/CONTRIBUTING.md	Removed Spanish contributing guide
.github/locales/es/CODE_OF_CONDUCT.md	Removed Spanish CoC
.github/locales/es/CHANGELOG.md	Removed Spanish changelog
.github/instructions/*.instructions.md	Replaced with placeholder
.github/CONTRIBUTING.md	Replaced with placeholder
.github/CODE_OF_CONDUCT.md	Replaced with placeholder
.gitattributes	Removed Git LFS attributes

Files not reviewed (1)

apps/capture/pnpm-lock.yaml: Language not supported

Comments suppressed due to low confidence (1)

.github/CONTRIBUTING.md:1

Replacing CONTRIBUTING content with a placeholder makes the repo’s contribution guidance effectively unusable (similarly for the Code of Conduct placeholders). If this repo is intended to be public/collaborative, consider restoring the actual documents; otherwise consider removing the files entirely to avoid misleading contributors.

Copilot · 2026-01-27T18:25:07Z

apps/capture/src-tauri/tauri.conf.json

+    "frontendDist": "../dist"
+  },
+  "app": {
+    "withGlobalTauri": true,


Disabling CSP ("csp": null) and enabling withGlobalTauri increases the attack surface (especially if any web content ever becomes injectable). Consider setting a restrictive CSP for production (and loosening it only in dev), and turning withGlobalTauri off unless you explicitly need a global __TAURI__ object.

Copilot · 2026-01-27T18:25:07Z

apps/capture/src-tauri/tauri.conf.json

+    "security": {
+      "csp": null
+    }


Disabling CSP ("csp": null) and enabling withGlobalTauri increases the attack surface (especially if any web content ever becomes injectable). Consider setting a restrictive CSP for production (and loosening it only in dev), and turning withGlobalTauri off unless you explicitly need a global __TAURI__ object.

Copilot · 2026-01-27T18:25:07Z

apps/capture/src-tauri/capabilities/default.json

+  "identifier": "default",
+  "description": "Permisos predeterminados para la ventana principal y funcionalidades del sistema",
+  "windows": ["main"],
+  "permissions": [


The default capability grants very broad filesystem + shell permissions. For least-privilege, consider removing unused permissions and constraining fs to an explicit allowlist/scope (e.g., app data + model directory only) and restricting shell:allow-open if you don’t need arbitrary URI launching.

Copilot · 2026-01-27T18:25:08Z

apps/capture/src-tauri/capabilities/default.json

+    "fs:allow-create",
+    "fs:allow-remove",
+    "fs:allow-mkdir"
+  ]


The default capability grants very broad filesystem + shell permissions. For least-privilege, consider removing unused permissions and constraining fs to an explicit allowlist/scope (e.g., app data + model directory only) and restricting shell:allow-open if you don’t need arbitrary URI launching.

Suggested change

]

],

"fs": {

"scope": [

"$APPDATA/**",

"$APPLOG/**"

]

},

"shell": {

"scope": [

"https://*"

]

}

Copilot · 2026-01-27T18:25:08Z

apps/capture/src/components/SetupWizard.tsx

+            <p className="progress-percent">
+              {progress?.percentage.toFixed(1)}%
+            </p>


progress?.percentage.toFixed(1) can still throw if progress is null/undefined because the optional chaining only applies to progress, not to percentage.toFixed. Consider guarding explicitly (e.g., defaulting to 0 or rendering nothing) so this can’t crash during transient renders.

Copilot · 2026-01-27T18:25:10Z

apps/capture/src/hooks/useRecording.ts

+import { useState, useEffect, useCallback, useRef } from "react";
+import { invoke, listen } from "../lib/tauri";
+import type { RecordingState } from "../types";
+import { useLatest } from "./useLatest";


useLatest is imported but not used in this file. Removing the unused import (and the related unused-comment block, if no longer needed) will keep the hook focused and avoid lint failures in strict setups.

Copilot · 2026-01-27T18:25:10Z

apps/capture/src-tauri/src/audio/playback.rs

+pub fn play_sound(cue: SoundCue) {
+    let bytes = cue.get_bytes();
+
+    // Reproducción no bloqueante en thread separado
+    thread::spawn(move || {
+        if let Err(e) = play_sound_blocking(bytes) {
+            log::warn!("⚠️ Error reproduciendo sonido {:?}: {}", cue, e);
+        }
+    });
+}


Spawning a brand-new OS thread per sound cue can become expensive over time (and can lead to unbounded thread creation if toggled frequently). Consider using tokio::task::spawn_blocking, or a single long-lived worker thread + channel, and/or reusing a rodio::OutputStream instead of re-opening it every time.

Copilot · 2026-01-27T18:25:10Z

apps/capture/src-tauri/src/audio/devices.rs

+    #[test]
+    fn test_list_devices() {
+        // Este test solo verifica que no hay panic
+        // Los dispositivos disponibles dependen del sistema
+        let result = list_input_devices();
+        assert!(result.is_ok());
+    }


This test may fail on CI/headless environments without audio devices or permissions (even though the code is fine). Consider marking it #[ignore] (similar to the clipboard test) or relaxing expectations (e.g., allow a “no host/device” error) so CI reliability doesn’t depend on hardware.

Copilot · 2026-01-27T18:25:11Z

apps/capture/Cargo.toml

+version = "0.1.0"
+authors = ["zarvent"]
+edition = "2021"
+rust-version = "1.81"


apps/capture/rust-toolchain.toml pins Rust 1.85, and apps/capture/src-tauri/Cargo.toml also declares rust-version = "1.85", but the workspace sets rust-version = "1.81". Aligning these avoids MSRV confusion and prevents tooling from making incorrect assumptions.

Suggested change

rust-version = "1.81"

rust-version = "1.85"

Copilot · 2026-01-27T18:25:11Z

README.md

+### Stack técnico

-```mermaid
-flowchart LR
-A[📋 copy] --> B{LLM}
-B --> C[📋 replace]
 ```
-
-> if you don't see the diagrams, you need a mermaid extension
-
---
-
-## license
-
-This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](LICENSE) file for more details.
+Frontend:  React + TypeScript + Tauri 2.0
+Backend:   Rust + whisper.cpp + Silero VAD
+Audio:     cpal (captura) + rubato (resampling)
+Output:    arboard (clipboard)


The README ends with a closing code fence (```) but the corresponding opening fence is missing, which will break markdown rendering for everything that follows. Add the opening fence before the stack lines (or remove the closing fence) to keep formatting correct.

…auri API for browser development

…found

…tate changes

…tion to direct backend invocation, adding atomic synchronization, and ensuring clipboard persistence on Linux.

cesarszv · 2026-01-27T22:50:44Z

📝 Informe de Sesión: Optimización y Estabilización de Grabación

Proyecto: capture (voice2machine)
Fecha: 2026-01-27
Estado: ✅ Finalizado y Verificado

1. Objetivo Inicial

Se necesitaba corregir el comportamiento de grabación de la aplicación capture para transformarlo de un sistema de "auto-detención por silencio" a un sistema de control manual absoluto.

Requerimientos clave:

La grabación solo debe detenerse cuando el usuario presiona el shortcut (Ctrl+Shift+Space) por segunda vez.
El VAD debe usarse para segmentar y optimizar la captura, no para terminar la tarea.
Uso forzado del modelo whisper-large-v3-turbo de forma local.

2. Diagnóstico de Problemas Detectados

A. La Carrera de Eventos (Race Condition)

Identificamos una "condición de carrera" crítica en el flujo de activación:

El Backend (Rust) detectaba el shortcut y emitía un evento toggle-recording.
El Frontend (React) escuchaba ese evento y llamaba de vuelta al backend vía IPC (invoke("toggle_recording")).
Esto generaba una doble invocación por cada pulsación física.
Durante la carga del modelo (que toma ~1.8s), la segunda llamada entraba mientras la primera aún no terminaba, resultando en una cancelación inmediata de la grabación recién iniciada.

B. Terminación Prematura por VAD

El código original rompía el loop de captura (break) en cuanto el VAD detectaba SpeechEnded. Esto impedía capturar oraciones largas con pausas naturales.

3. Implementación de la Solución Técnica

🛠️ Backend: Simplificación del Flujo de Activación

Se eliminó el "round-trip" innecesario a través del frontend. Ahora el handler del shortcut global invoca la lógica de grabación directamente en la memoria del proceso backend.

// src-tauri/src/lib.rs
tauri::async_runtime::spawn(async move {
    let state = app_handle.state::<AppState>();
    let mut pipeline = state.pipeline.lock().await;
    pipeline.toggle_recording().await; // Invocación directa
});

🧠 Orquestador: Sincronización Atómica

Para evitar que múltiples hilos intenten grabar simultáneamente durante la carga del modelo, implementamos un sistema de flags atómicos y debouncing:

Flag is_running (AtomicBool): Garantiza que solo exista una tarea de grabación activa.
Debounce de 300ms: El orquestador ignora cualquier comando de toggle que ocurra en una ventana de tiempo demasiado corta, filtrando "rebotes" del teclado o eventos duplicados del sistema operativo.
Manual Stop Loop: Modificamos run_audio_capture para ignorar los eventos SpeechEnded. El loop solo termina si:
- El cancel_flag se pone en true (vía segundo shortcut).
- Se alcanza el hard_timeout de seguridad (30 segundos).

🖥️ Frontend: Escucha Pasiva

El hook useRecording.ts fue simplificado. Ya no intenta controlar la grabación basándose en eventos de teclado; ahora simplemente se suscribe a un "Stream de Estado" (pipeline-event) que emite el backend para actualizar la interfaz (iconos, logs, carga del modelo).

4. Detalles de Performance y Privacidad

Modelo: Confirmamos la integración de ggml-large-v3-turbo.bin.
Latencia: Al mover el toggle al backend, eliminamos ~50-100ms de latencia de IPC en el inicio de la grabación.
Privacidad: Todo el procesamiento ocurre en el Tokio Runtime de Rust, sin buffers que toquen el disco o internet (Local-first).

5. Resultados de Verificación

Tras las pruebas finales en el entorno de desarrollo:

Iniciación: El shortcut activa la grabación instantáneamente. El log muestra "Modelo Whisper cargado" y "Grabación iniciada".
Estabilidad: El modelo carga una sola vez y se mantiene en memoria. Presiones rápidas del shortcut ya no "rompen" el estado gracias al debounce.
Grabación Continua: Hablar, pausar por 2 segundos y seguir hablando funciona correctamente. El VAD detecta los segmentos pero el pipeline permanece abierto.
Finalización: El segundo shortcut detiene la captura, envía el buffer acumulado a Whisper y el texto resultante aparece en el Clipboard del sistema en menos de 500ms tras la detención.

6. Conclusión

La arquitectura de capture es ahora robusta y predecible. Se ha cumplido con el principio de Pragmatismo eliminando la complejidad innecesaria de la comunicación bidireccional de eventos y delegando el control de estado al orquestador de Rust.

cesarszv · 2026-01-27T23:07:30Z

It works!!! The flow works and is stable, but there are a few small things that need polishing.

Frontend:

Make it responsive.
Polish the code.

Backend:

The model is very imprecise; the transcriptions made by the model need to be improved.
Apply a good configuration so that the program takes advantage of my device's GPU.
Ensure that the flow does not break and is stable

cesarszv added 17 commits January 27, 2026 03:12

let him cook

fa5ecd8

chore(tasks): remove unused "Restart Daemon" task from tasks.json

348fa5f

feat(docs): add detailed architecture and technology stack for captur…

3715a00

…e app

feat: update dependencies and improve audio capture handling

e43ab5c

refactor(capture): improve pipeline reliability and sync frontend com…

55d08d8

…mands

feat: refactor Tauri API usage and enhance error handling in components

6d6efb8

feat(docs): add README files for capture app and source structure

b4967ef

feat(docs): update README for capture app and add Tauri back-end docu…

ccd3eec

…mentation

feat(docs): update README files for capture app and add Tauri backend…

45095cf

… documentation

feat(docs): enhance README for capture app with detailed sections

dc9b19d

feat: add new skill for reviewing UI code against web design guidelines

2b25682

feat: Introduce Vercel React best practices skills across agents and …

fe8739a

…remove obsolete web design guidelines.

feat: Enhance SettingsPanel with state color and text functions; refa…

57c5e14

…ctor button state logic

Copilot AI review requested due to automatic review settings January 27, 2026 18:20

Copilot AI reviewed Jan 27, 2026

View reviewed changes

cesarszv added 5 commits January 27, 2026 14:28

feat: Implement mock functionality for invoke and listen methods in T…

1a641d4

…auri API for browser development

feat: Remove obsolete vercel-react-best-practices skill from capture app

1f6fcc5

feat: Add model existence check and show configuration window if not …

5cf0870

…found

feat: Refactor pipeline event handling and unify event emission for s…

12d271d

…tate changes

feat: Implement robust manual recording control by refactoring activa…

2a0c5ca

…tion to direct backend invocation, adding atomic synchronization, and ensuring clipboard persistence on Linux.

-  ]
+  ],
+  "fs": {
+    "scope": [
+      "$APPDATA/**",
+      "$APPLOG/**"
+    ]
+  },
+  "shell": {
+    "scope": [
+      "https://*"
+    ]
+  }

Conversation

cesarszv commented Jan 27, 2026

Informe de Refactorización: Voice2Machine 2026 Rebirth

Resumen

Contexto y Motivación

Metas del diseño

Evolución del Proyecto

Fase 1: Scripts de Python

Fase 2: Arquitectura CQRS

Fase 3: API con FastAPI

Fase 4: Refactor (2026-01-26 - Estado Actual)

Decisiones de Arquitectura

Principio Rector: Local-First, no Local-Only

Tech Stack: Rust + Tauri

Motor de IA: Whisper.cpp + VAD

Filosofía de Hardware

Alcance y Limitaciones

Lo que capture ES

Lo que capture NO ES

Estado Actual (MVP)

Futuro

Lecciones Aprendidas

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

cesarszv commented Jan 27, 2026

📝 Informe de Sesión: Optimización y Estabilización de Grabación

1. Objetivo Inicial

2. Diagnóstico de Problemas Detectados

A. La Carrera de Eventos (Race Condition)

B. Terminación Prematura por VAD

3. Implementación de la Solución Técnica

🛠️ Backend: Simplificación del Flujo de Activación

🧠 Orquestador: Sincronización Atómica

🖥️ Frontend: Escucha Pasiva

4. Detalles de Performance y Privacidad

5. Resultados de Verificación

6. Conclusión

Uh oh!

cesarszv commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development