Skip to content

Refactor: 2026 Rebirth - Rust + Tauri v2 Architecture#105

Open
cesarszv wants to merge 22 commits intomainfrom
2026-rebirth
Open

Refactor: 2026 Rebirth - Rust + Tauri v2 Architecture#105
cesarszv wants to merge 22 commits intomainfrom
2026-rebirth

Conversation

@cesarszv
Copy link
Collaborator

Informe de Refactorización: Voice2Machine 2026 Rebirth

Este documento describe la refactorización mayor de voice2machine. Es un informe sobre el por qué de las decisiones arquitectónicas y el estado actual del proyecto.


Resumen

capture es una utilidad de voz a texto que se ejecuta completamente en local.

Flujo: Presiona atajo (Ctrl+Shift+Space) → Habla → El texto aparece en tu portapapeles.

Diseñada para velocidad, privacidad y simplicidad. Sin servidores, sin suscripciones, sin complejidad innecesaria.


Contexto y Motivación

El objetivo es simple: eliminar la fricción entre pensar algo y tenerlo escrito.

Metas del diseño

  • Velocidad: Respuesta < 1 segundo (con hardware adecuado).
  • Independencia: Cero dependencia de internet.
  • Privacidad: Sin tracking ni telemetría.
  • UX: Interfaz mínima, casi invisible.

Evolución del Proyecto

Fase 1: Scripts de Python

El primer intento. Scripts llamando a Whisper desde la terminal.

  • Aprendizaje: Whisper local es viable.
  • Problema: Fricción excesiva. Nadie quiere abrir una terminal para dictar una frase.

Fase 2: Arquitectura CQRS

Intento de "arquitectura empresarial". Separación de comandos, queries, event sourcing.

  • Problema: Over-engineering masivo. La complejidad mató la velocidad de desarrollo sin aportar valor real al usuario final.

Fase 3: API con FastAPI

Simplificación a servidor REST.

  • Mejora: Más pragmático que CQRS.
  • Problema: Python no es ideal para distribución desktop (latencia GC, problemas de empaquetado, GIL).

Fase 4: Refactor (2026-01-26 - Estado Actual)

Reinicio total ("Rebirth"). Borrón y cuenta nueva.

  • Cambio fundamental: Resolver el problema específico (dictado local rápido) en lugar de construir una plataforma genérica.

Decisiones de Arquitectura

Principio Rector: Local-First, no Local-Only

  • Local-First: Procesamiento en tu máquina por defecto. Privacidad garantizada.
  • Extensibilidad: La arquitectura permite futuros proveedores cloud (opcional), pero nunca como requisito.

Tech Stack: Rust + Tauri

Evaluación de alternativas:

  • Electron: Demasiado pesado (~200MB+ RAM/Disco) para una utilidad de fondo.
  • Python: Problemas de empaquetado y latencia (GIL/GC).
  • Rust + Tauri: Ganador. Binarios ligeros (~10MB), latencia predecible, concurrencia real, UI moderna con React.

Motor de IA: Whisper.cpp + VAD

  • Whisper.cpp (whisper-rs): Inferencia local optimizada. Modelo large-v3-turbo (~3GB) descargado una vez. Sin costos por minuto, sin latencia de red, privacidad total.
  • VAD (Voice Activity Detection): Separamos la detección de voz de la transcripción.
    1. Silero VAD detecta voz/silencio en ms.
    2. Solo el audio con voz se envía a Whisper.
    • Resultado: Reducción drástica de latencia y procesamiento inútil.

Filosofía de Hardware

  • Compilación condicional: Soporte nativo para aceleradores, con fallback grácil a CPU.
  • Linux: Prioridad a Vulkan sobre CUDA para portabilidad (AMD/Intel/Nvidia) sin drivers propietarios cerrados.

Alcance y Limitaciones

Lo que capture ES

  • Utilidad de propósito único: Voz → Portapapeles.
  • Atajo global.
  • Feedback visual mínimo.

Lo que capture NO ES

  • No es un asistente (Siri/Alexa).
  • No es para transcripción de reuniones largas (batch).
  • No es un editor de audio.

Estado Actual (MVP)

  • ✅ Atajo global (Ctrl+Shift+Space).
  • ✅ Grabación con feedback visual.
  • ✅ VAD integrado.
  • ✅ Transcripción local (Whisper large-v3-turbo).
  • ✅ Copia automática al clipboard.

Futuro

  • Soporte opcional para APIs Cloud.
  • Más idiomas.
  • Configuración de atajos.

Lecciones Aprendidas

  1. Simplicidad: La solución más simple suele ser la correcta. La complejidad prematura (CQRS) es costosa.
  2. Herramientas: El ecosistema de Rust (Tauri, whisper-rs, cpal) está maduro para aplicaciones de escritorio de alto rendimiento.
  3. Iteración: Los fallos previos fueron necesarios para entender qué no construir.

Backend (Rust - src-tauri/src/)
| File | Description |
|------|-------------|
| pipeline/mod.rs | Module exports for pipeline |
| pipeline/orchestrator.rs | Main orchestrator connecting audio → VAD → Whisper → clipboard |
| output/mod.rs | Module exports for output |
| output/clipboard.rs | arboard wrapper for clipboard operations |
| tray/mod.rs | Module exports for tray |
| tray/icon.rs | Dynamic tray icon management with state-based updates |
| lib.rs | Main library with Tauri IPC commands and setup |
| main.rs | Tauri application entry point |
| transcription/model.rs | Added ModelDownloader struct |
Frontend (React/TypeScript - src/)
| File | Description |
|------|-------------|
| types/index.ts | TypeScript types for IPC communication |
| hooks/useDownload.ts | Hook for model download with progress |
| hooks/useConfig.ts | Hook for app configuration management |
| hooks/useRecording.ts | Hook for recording state and transcription |
| hooks/index.ts | Hook exports |
| components/SetupWizard.tsx | Model download wizard UI |
| components/SetupWizard.css | Styles for wizard |
| components/SettingsPanel.tsx | Settings/control panel UI |
| components/SettingsPanel.css | Styles for settings |
| App.tsx | Fixed IPC command name |
Icons (src-tauri/icons/)
| File | Description |
|------|-------------|
| icon-idle.png | Gray microphone (idle state) |
| icon-recording.png | Red microphone (recording state) |
| icon-processing.png | Yellow microphone (processing state) |
| icon.png | 512x512 app icon |
| 32x32.png, 128x128.png, 128x128@2x.png | Various size app icons |
Configuration Updates
| File | Change |
|------|--------|
| rust-toolchain.toml | Updated to Rust 1.85 |
| Cargo.toml | Updated rust-version |
IPC Commands Available:
toggle_recording    - Start/stop recording
get_state          - Get current recording state
list_audio_devices - List available input devices
get_config         - Get app configuration
set_config         - Update configuration
is_model_downloaded - Check if model exists
get_model_info     - Get model information
download_model     - Download Whisper model
cancel_download    - Cancel download
load_model         - Load model into memory
is_model_loaded    - Check if model is loaded
…nings

- Remove empty plugin configs from tauri.conf.json (dialog, shell, etc. require unit type, not empty object)
- Remove unused tray icon ID constants
- Add #[allow(dead_code)] for pre_speech_samples field kept for debugging
…kfile versioning

Summary:
Major structural refactor to align with 2026 engineering standards for reproducibility and cleanliness. This commit consolidates assets, enforces lockfile tracking, and centralizes documentation.

Details:
- **Reproducibility**: Un-ignored `Cargo.lock` and `pnpm-lock.yaml` in `.gitignore`. This is critical for deterministic builds across CI/CD and development environments.
- **Asset Consolidation**: Moved all static resources from `src-tauri/{icons,sounds}` to a unified `src-tauri/assets/` directory.
  - Updated `tauri.conf.json` to reference new icon paths.
  - Refactored `icon.rs` and `playback.rs` to load assets from the new location using `include_bytes!` and runtime paths.
- **Documentation**: Established `docs/adr/` for Architecture Decision Records. Moved 'MVP birth plan' to `docs/adr/0001-mvp-birth-plan.md`.
- **Project Hygiene**:
  - Added `clean`, `clean:rust`, and `clean:all` scripts to `package.json`.
  - Updated root `.gitignore` to properly filter build artifacts (`target/`, `dist/`, `node_modules/`) while respecting project-level configs.
  - Enhanced VSCode workspace configuration with advanced file nesting patterns for reduced visual noise.
- **Monorepo Structure**: Migrated root-level configuration files (`Cargo.toml`, `README.md`, etc.) into `apps/capture` scope where appropriate, reflecting the focus on the Capture app.
- Translated and expanded AGENTS.md to provide a comprehensive overview of the capture app, including architecture, principles, and key modules.
- Updated README.md to clarify the purpose and functionality of the capture app, emphasizing local-first design and user privacy.
- Enhanced docs/README.md with a structured index and detailed explanations of Architecture Decision Records (ADRs).
- Revised src-tauri/README.md to outline the backend architecture and its components, highlighting the use of Rust for performance.
- Improved src/README.md to detail the frontend structure, including components, hooks, and types, while emphasizing the separation of concerns between frontend and backend.
- Added a new top-level README.md for the voice2machine monorepo, summarizing its purpose, philosophy, and technical stack.
- Updated voice2machine.code-workspace to include a new path for v2m-skills.
Copilot AI review requested due to automatic review settings January 27, 2026 18:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Major “Rebirth” refactor that retires the legacy Python daemon toolchain and introduces a new Tauri v2 + Rust architecture (with a React/TypeScript UI) for local-first voice dictation.

Changes:

  • Removed legacy Python backend scripts, configs, prompts, and repo meta docs tied to the old daemon.
  • Added new apps/capture Tauri v2 application: React UI + Rust backend modules (audio, VAD, transcription, tray, clipboard).
  • Updated repo-level docs and VS Code tasks to match the new workflow.

Reviewed changes

Copilot reviewed 190 out of 363 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
apps/daemon/backend/scripts/operations/daemon/stop_daemon.sh Removed legacy daemon stop script
apps/daemon/backend/scripts/operations/daemon/start_daemon.sh Removed legacy daemon start script
apps/daemon/backend/scripts/operations/daemon/restart_daemon.sh Removed legacy daemon restart script
apps/daemon/backend/scripts/operations/client/v2m-process.sh Removed legacy clipboard→LLM script
apps/daemon/backend/scripts/operations/client/v2m-gemini.sh Removed legacy Gemini script
apps/daemon/backend/scripts/diagnostics/verify_export_backend.py Removed legacy IPC verification tool
apps/daemon/backend/scripts/diagnostics/check_cuda.py Removed legacy CUDA diagnostic
apps/daemon/backend/scripts/development/testing/test_whisper_standalone.py Removed legacy Whisper pipeline test
apps/daemon/backend/scripts/development/testing/test_whisper_gpu.py Removed legacy GPU load test
apps/daemon/backend/scripts/development/testing/test_clipboard.py Removed legacy clipboard test
apps/daemon/backend/scripts/development/testing/check_clipboard.py Removed legacy clipboard diagnostics
apps/daemon/backend/scripts/development/maintenance/repair_libs.sh Removed legacy NVIDIA libs repair script
apps/daemon/backend/scripts/development/maintenance/cleanup_v2m.sh Removed legacy cleanup script
apps/daemon/backend/scripts/development/create-pr.sh Removed legacy PR helper
apps/daemon/backend/scripts/README.md Removed legacy scripts docs
apps/daemon/backend/resources/prompts/refine_system.txt Removed legacy LLM prompt
apps/daemon/backend/resources/prompts/README.md Removed prompts folder docs
apps/daemon/backend/resources/models/.gitkeep Removed models placeholder
apps/daemon/backend/requirements.txt Removed Python deps locklist
apps/daemon/backend/requirements-minimal.txt Removed minimal Python deps
apps/daemon/backend/pyproject.toml Removed Python project config
apps/daemon/backend/config.toml Removed legacy backend config
apps/daemon/backend/README.md Removed legacy backend docs
apps/daemon/backend/.opencode/skills/backend-commit Removed legacy agent skill link
apps/daemon/backend/.github/skills/backend-commit Removed legacy agent skill link
apps/daemon/backend/.gemini/skills/backend-commit Removed legacy agent skill link
apps/daemon/backend/.gemini/settings.json Removed legacy Gemini settings
apps/daemon/backend/.gemini/.gitignore Removed legacy Gemini ignore
apps/daemon/backend/.cursor/skills/backend-commit Removed legacy agent skill link
apps/daemon/backend/.codex/skills/backend-commit Removed legacy agent skill link
apps/daemon/backend/.agents/skills/backend-commit/SKILL.md Removed backend commit workflow doc
apps/daemon/backend/.agent/skills/backend-commit Removed legacy agent skill link
apps/daemon/backend/.agent/automated tasks/update agents and readme in backend/TASK.md Removed agent automated task
apps/daemon/backend/.agent/automated tasks/README.md Removed agent automated tasks index
apps/daemon/backend/.agent/automated tasks/04_dependency_management/task.md Removed dependency maintenance task
apps/daemon/backend/.agent/automated tasks/03_environment_maintenance/task.md Removed env maintenance task
apps/daemon/backend/.agent/automated tasks/02_test_verification/task.md Removed test verification task
apps/daemon/backend/.agent/automated tasks/01_code_quality/task.md Removed code quality task
apps/capture/vite.config.ts Added Vite config for Tauri dev server
apps/capture/tsconfig.node.json Added TS config for tooling files
apps/capture/tsconfig.json Added strict TS config for UI
apps/capture/src/types/index.ts Added typed IPC/UI contracts
apps/capture/src/main.tsx Added React entrypoint
apps/capture/src/lib/tauri.ts Added typed wrappers for Tauri API
apps/capture/src/index.css Added base UI styles
apps/capture/src/hooks/useRecording.ts Added recording state/event hook
apps/capture/src/hooks/useLatest.ts Added “latest ref” hook utility
apps/capture/src/hooks/useDownload.ts Added model download hook
apps/capture/src/hooks/useConfig.ts Added config/devices hook
apps/capture/src/hooks/index.ts Added hooks barrel exports
apps/capture/src/components/SetupWizard.tsx Added first-run model download wizard
apps/capture/src/components/SetupWizard.css Added setup wizard styles
apps/capture/src/README.md Added frontend structure docs
apps/capture/src/App.tsx Added app boot flow (wizard vs settings)
apps/capture/src/App.css Added app shared styles
apps/capture/src-tauri/tauri.conf.json Added Tauri v2 app configuration
apps/capture/src-tauri/src/vad/mod.rs Added VAD module root
apps/capture/src-tauri/src/tray/mod.rs Added tray module root
apps/capture/src-tauri/src/transcription/mod.rs Added transcription module root
apps/capture/src-tauri/src/pipeline/mod.rs Added pipeline module root
apps/capture/src-tauri/src/output/mod.rs Added output module root
apps/capture/src-tauri/src/output/clipboard.rs Added clipboard integration
apps/capture/src-tauri/src/lib.rs Added Rust app state + setup
apps/capture/src-tauri/src/config/mod.rs Added config module root
apps/capture/src-tauri/src/audio/playback.rs Added sound cue playback
apps/capture/src-tauri/src/audio/mod.rs Added audio module root
apps/capture/src-tauri/src/audio/devices.rs Added audio device enumeration
apps/capture/src-tauri/capabilities/default.json Added Tauri capabilities permissions
apps/capture/src-tauri/build.rs Added Tauri build script
apps/capture/src-tauri/README.md Added Rust backend docs
apps/capture/src-tauri/Cargo.toml Added Rust deps/features for Capture
apps/capture/skills/vercel-react-best-practices Added local skill link
apps/capture/rust-toolchain.toml Pinned Rust toolchain for Capture
apps/capture/pnpm-workspace.yaml Added per-app pnpm workspace
apps/capture/package.json Added Capture JS deps/scripts
apps/capture/index.html Added Vite HTML entry
apps/capture/docs/README.md Added Capture docs index
apps/capture/README.md Added Capture app overview
apps/capture/Cargo.toml Added Rust workspace for Capture
apps/capture/.opencode/skills/web-design-guidelines Added local skill link
apps/capture/.opencode/skills/vercel-react-best-practices Added local skill link
apps/capture/.gitignore Added app-level ignore rules
apps/capture/.github/skills/web-design-guidelines Added local skill link
apps/capture/.github/skills/vercel-react-best-practices Added local skill link
apps/capture/.cursor/skills/web-design-guidelines Added local skill link
apps/capture/.cursor/skills/vercel-react-best-practices Added local skill link
apps/capture/.claude/skills/vercel-react-best-practices Added local skill link
apps/capture/.agents/skills/web-design-guidelines/SKILL.md Added web UI review skill doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-serialization.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-parallel-fetching.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-dedup-props.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-cache-react.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-cache-lru.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-auth-actions.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/server-after-nonblocking.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-use-ref-transient-values.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-transitions.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-simple-expression-in-memo.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-move-effect-to-event.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-memo.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-memo-with-default-value.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practactices/rules/rerender-lazy-state-init.md Added guideline rule doc (path note: verify spelling)
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-functional-setstate.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-derived-state.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-derived-state-no-effect.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-dependencies.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rerender-defer-reads.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-usetransition-loading.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-svg-precision.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-hydration-suppress-warning.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-hydration-no-flicker.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-hoist-jsx.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-content-visibility.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-conditional-render.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-animate-svg-wrapper.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/rendering-activity.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-tosorted-immutable.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-set-map-lookups.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-min-max-loop.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-length-check-first.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-index-maps.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-hoist-regexp.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-early-exit.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-combine-iterations.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-cache-storage.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-cache-property-access.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-cache-function-results.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/js-batch-dom-css.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-swr-dedup.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-passive-event-listeners.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-localstorage-schema.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/client-event-listeners.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-preload.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-dynamic-imports.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-defer-third-party.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-conditional.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/bundle-barrel-imports.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-suspense-boundaries.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-parallel.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-dependencies.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-defer-await.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/async-api-routes.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/advanced-use-latest.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/advanced-init-once.md Added guideline rule doc
apps/capture/.agents/skills/vercel-react-best-practices/rules/advanced-event-handler-refs.md Added guideline rule doc
apps/capture/.agent/skills/web-design-guidelines Added local skill link
apps/capture/.agent/skills/vercel-react-best-practices Added local skill link
README.md Updated repo overview to new architecture
CHANGELOG.md Removed legacy changelog
.vscode/tasks.json Updated tasks for Capture workflow
.github/locales/es/CONTRIBUTING.md Removed Spanish contributing guide
.github/locales/es/CODE_OF_CONDUCT.md Removed Spanish CoC
.github/locales/es/CHANGELOG.md Removed Spanish changelog
.github/instructions/*.instructions.md Replaced with placeholder
.github/CONTRIBUTING.md Replaced with placeholder
.github/CODE_OF_CONDUCT.md Replaced with placeholder
.gitattributes Removed Git LFS attributes
Files not reviewed (1)
  • apps/capture/pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)

.github/CONTRIBUTING.md:1

  • Replacing CONTRIBUTING content with a placeholder makes the repo’s contribution guidance effectively unusable (similarly for the Code of Conduct placeholders). If this repo is intended to be public/collaborative, consider restoring the actual documents; otherwise consider removing the files entirely to avoid misleading contributors.

"frontendDist": "../dist"
},
"app": {
"withGlobalTauri": true,
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling CSP ("csp": null) and enabling withGlobalTauri increases the attack surface (especially if any web content ever becomes injectable). Consider setting a restrictive CSP for production (and loosening it only in dev), and turning withGlobalTauri off unless you explicitly need a global __TAURI__ object.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +32
"security": {
"csp": null
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling CSP ("csp": null) and enabling withGlobalTauri increases the attack surface (especially if any web content ever becomes injectable). Consider setting a restrictive CSP for production (and loosening it only in dev), and turning withGlobalTauri off unless you explicitly need a global __TAURI__ object.

Copilot uses AI. Check for mistakes.
"identifier": "default",
"description": "Permisos predeterminados para la ventana principal y funcionalidades del sistema",
"windows": ["main"],
"permissions": [
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default capability grants very broad filesystem + shell permissions. For least-privilege, consider removing unused permissions and constraining fs to an explicit allowlist/scope (e.g., app data + model directory only) and restricting shell:allow-open if you don’t need arbitrary URI launching.

Copilot uses AI. Check for mistakes.
"fs:allow-create",
"fs:allow-remove",
"fs:allow-mkdir"
]
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default capability grants very broad filesystem + shell permissions. For least-privilege, consider removing unused permissions and constraining fs to an explicit allowlist/scope (e.g., app data + model directory only) and restricting shell:allow-open if you don’t need arbitrary URI launching.

Suggested change
]
],
"fs": {
"scope": [
"$APPDATA/**",
"$APPLOG/**"
]
},
"shell": {
"scope": [
"https://*"
]
}

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +84
<p className="progress-percent">
{progress?.percentage.toFixed(1)}%
</p>
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

progress?.percentage.toFixed(1) can still throw if progress is null/undefined because the optional chaining only applies to progress, not to percentage.toFixed. Consider guarding explicitly (e.g., defaulting to 0 or rendering nothing) so this can’t crash during transient renders.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +4
import { useState, useEffect, useCallback, useRef } from "react";
import { invoke, listen } from "../lib/tauri";
import type { RecordingState } from "../types";
import { useLatest } from "./useLatest";
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useLatest is imported but not used in this file. Removing the unused import (and the related unused-comment block, if no longer needed) will keep the hook focused and avoid lint failures in strict setups.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +56
pub fn play_sound(cue: SoundCue) {
let bytes = cue.get_bytes();

// Reproducción no bloqueante en thread separado
thread::spawn(move || {
if let Err(e) = play_sound_blocking(bytes) {
log::warn!("⚠️ Error reproduciendo sonido {:?}: {}", cue, e);
}
});
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawning a brand-new OS thread per sound cue can become expensive over time (and can lead to unbounded thread creation if toggled frequently). Consider using tokio::task::spawn_blocking, or a single long-lived worker thread + channel, and/or reusing a rodio::OutputStream instead of re-opening it every time.

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +87
#[test]
fn test_list_devices() {
// Este test solo verifica que no hay panic
// Los dispositivos disponibles dependen del sistema
let result = list_input_devices();
assert!(result.is_ok());
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may fail on CI/headless environments without audio devices or permissions (even though the code is fine). Consider marking it #[ignore] (similar to the clipboard test) or relaxing expectations (e.g., allow a “no host/device” error) so CI reliability doesn’t depend on hardware.

Copilot uses AI. Check for mistakes.
version = "0.1.0"
authors = ["zarvent"]
edition = "2021"
rust-version = "1.81"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apps/capture/rust-toolchain.toml pins Rust 1.85, and apps/capture/src-tauri/Cargo.toml also declares rust-version = "1.85", but the workspace sets rust-version = "1.81". Aligning these avoids MSRV confusion and prevents tooling from making incorrect assumptions.

Suggested change
rust-version = "1.81"
rust-version = "1.85"

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +57
### Stack técnico

```mermaid
flowchart LR
A[📋 copy] --> B{LLM}
B --> C[📋 replace]
```

> if you don't see the diagrams, you need a mermaid extension

---

## license

This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](LICENSE) file for more details.
Frontend: React + TypeScript + Tauri 2.0
Backend: Rust + whisper.cpp + Silero VAD
Audio: cpal (captura) + rubato (resampling)
Output: arboard (clipboard)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README ends with a closing code fence (```) but the corresponding opening fence is missing, which will break markdown rendering for everything that follows. Add the opening fence before the stack lines (or remove the closing fence) to keep formatting correct.

Copilot uses AI. Check for mistakes.
@cesarszv
Copy link
Collaborator Author

📝 Informe de Sesión: Optimización y Estabilización de Grabación

Proyecto: capture (voice2machine)
Fecha: 2026-01-27
Estado: ✅ Finalizado y Verificado


1. Objetivo Inicial

Se necesitaba corregir el comportamiento de grabación de la aplicación capture para transformarlo de un sistema de "auto-detención por silencio" a un sistema de control manual absoluto.

Requerimientos clave:

  • La grabación solo debe detenerse cuando el usuario presiona el shortcut (Ctrl+Shift+Space) por segunda vez.
  • El VAD debe usarse para segmentar y optimizar la captura, no para terminar la tarea.
  • Uso forzado del modelo whisper-large-v3-turbo de forma local.

2. Diagnóstico de Problemas Detectados

A. La Carrera de Eventos (Race Condition)

Identificamos una "condición de carrera" crítica en el flujo de activación:

  1. El Backend (Rust) detectaba el shortcut y emitía un evento toggle-recording.
  2. El Frontend (React) escuchaba ese evento y llamaba de vuelta al backend vía IPC (invoke("toggle_recording")).
  3. Esto generaba una doble invocación por cada pulsación física.
  4. Durante la carga del modelo (que toma ~1.8s), la segunda llamada entraba mientras la primera aún no terminaba, resultando en una cancelación inmediata de la grabación recién iniciada.

B. Terminación Prematura por VAD

El código original rompía el loop de captura (break) en cuanto el VAD detectaba SpeechEnded. Esto impedía capturar oraciones largas con pausas naturales.


3. Implementación de la Solución Técnica

🛠️ Backend: Simplificación del Flujo de Activación

Se eliminó el "round-trip" innecesario a través del frontend. Ahora el handler del shortcut global invoca la lógica de grabación directamente en la memoria del proceso backend.

// src-tauri/src/lib.rs
tauri::async_runtime::spawn(async move {
    let state = app_handle.state::<AppState>();
    let mut pipeline = state.pipeline.lock().await;
    pipeline.toggle_recording().await; // Invocación directa
});

🧠 Orquestador: Sincronización Atómica

Para evitar que múltiples hilos intenten grabar simultáneamente durante la carga del modelo, implementamos un sistema de flags atómicos y debouncing:

  • Flag is_running (AtomicBool): Garantiza que solo exista una tarea de grabación activa.
  • Debounce de 300ms: El orquestador ignora cualquier comando de toggle que ocurra en una ventana de tiempo demasiado corta, filtrando "rebotes" del teclado o eventos duplicados del sistema operativo.
  • Manual Stop Loop: Modificamos run_audio_capture para ignorar los eventos SpeechEnded. El loop solo termina si:
    • El cancel_flag se pone en true (vía segundo shortcut).
    • Se alcanza el hard_timeout de seguridad (30 segundos).

🖥️ Frontend: Escucha Pasiva

El hook useRecording.ts fue simplificado. Ya no intenta controlar la grabación basándose en eventos de teclado; ahora simplemente se suscribe a un "Stream de Estado" (pipeline-event) que emite el backend para actualizar la interfaz (iconos, logs, carga del modelo).


4. Detalles de Performance y Privacidad

  • Modelo: Confirmamos la integración de ggml-large-v3-turbo.bin.
  • Latencia: Al mover el toggle al backend, eliminamos ~50-100ms de latencia de IPC en el inicio de la grabación.
  • Privacidad: Todo el procesamiento ocurre en el Tokio Runtime de Rust, sin buffers que toquen el disco o internet (Local-first).

5. Resultados de Verificación

Tras las pruebas finales en el entorno de desarrollo:

  1. Iniciación: El shortcut activa la grabación instantáneamente. El log muestra "Modelo Whisper cargado" y "Grabación iniciada".
  2. Estabilidad: El modelo carga una sola vez y se mantiene en memoria. Presiones rápidas del shortcut ya no "rompen" el estado gracias al debounce.
  3. Grabación Continua: Hablar, pausar por 2 segundos y seguir hablando funciona correctamente. El VAD detecta los segmentos pero el pipeline permanece abierto.
  4. Finalización: El segundo shortcut detiene la captura, envía el buffer acumulado a Whisper y el texto resultante aparece en el Clipboard del sistema en menos de 500ms tras la detención.

6. Conclusión

La arquitectura de capture es ahora robusta y predecible. Se ha cumplido con el principio de Pragmatismo eliminando la complejidad innecesaria de la comunicación bidireccional de eventos y delegando el control de estado al orquestador de Rust.

@cesarszv
Copy link
Collaborator Author

It works!!! The flow works and is stable, but there are a few small things that need polishing.

Frontend:

  • Make it responsive.
    Polish the code.

Backend:

  • The model is very imprecise; the transcriptions made by the model need to be improved.
  • Apply a good configuration so that the program takes advantage of my device's GPU.
  • Ensure that the flow does not break and is stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants