VOXTERM

Local real-time voice transcription TUI with speaker diarization and P2P collaborative transcription. Runs entirely offline — no cloud APIs, no audio stored.

Privacy & Storage Policy

VoxTerm is local first and private by default. Everything runs on your machine. Nothing is ever sent to a server.

No audio is stored. Microphone input is processed in real-time and discarded. Only text transcripts are saved.
Voice profiles are encrypted at rest. Speaker embeddings (biometric data used to recognize voices across sessions) are encrypted with AES-256-CBC. The key lives in your macOS Keychain — zero config.
Transcripts are yours. Auto-saved as markdown to ~/Documents/voxterm-transcripts/. Never uploaded anywhere.
P2P stays on your LAN. Party mode shares transcripts over your local network only. No relay servers.
Delete everything anytime. Press P → delete to permanently wipe all voice data from disk.

Install

One command:

curl -fsSL https://github.com/dmarzzz/VoxTerm/releases/latest/download/install.sh | bash

Then run:

voxterm

Requires macOS and Python 3.12+. Apple Silicon Macs use MLX models; Intel Macs use the CPU-compatible faster-whisper backend. Models download automatically on first use.

Manual setup (for developers)

git clone https://github.com/dmarzzz/VoxTerm.git
cd voxterm
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 -m tui.app

Controls

Key	Action
`R`	Start/pause recording
`N`	Party mode — join or leave P2P sessions
`T`	Tag/name speakers
`P`	Speaker profiles
`M`	Switch transcription model
`L`	Switch language
`S`	Save/export transcript
`C`	Clear transcript
`D`	Toggle debug mode
`?`	Help
`Q`	Quit

Party Mode (P2P)

Multiple people in the same room can share transcripts over the local network. Each laptop captures its closest speaker best — the combined result is better than any single mic.

Press N to join the party. Press N again to leave. No codes, no configuration.

Auto-discovers nearby VoxTerm peers via mDNS
Auto-joins the nearest party, or hosts one if none found
Each party gets a unique color — all peers see the same color
Encrypted transcript sharing (AES-256-GCM)
Everyone sees who joins and leaves — no silent surveillance

See docs/party-mode-design.md for the full design.

Hivemind mode (transcript streaming to swf-node)

Voxterm can stream transcript batches to a "convent box" running swf-node --full --hivemind-sink. The sink wraps each batch in a signed bundle and propagates it on the LAN.

By default voxterm auto-discovers sinks via mDNS. To override:

voxterm --hivemind-sink-url=http://convent.local:7777

Or disable entirely:

voxterm --hivemind=off

Mode flag: --hivemind=auto|on|off

auto (default): mDNS-discover; fall back to local logging only.
on: require a sink (fails to start if discovery times out after 5s).
off: never POST; everything stays local.

Your voxterm device gets a persistent UUID stored at ~/Library/Application Support/voxterm/device_id (macOS) or ~/.config/voxterm/device_id (Linux); it appears in every batch as origin_device. (It is a stable identifier, not a cryptographic identity — the convent sink signs the bundle.)

Cadence: voxterm flushes a batch every ~60s, every 30 segments, or on exit, whichever comes first. Batches POST to <sink>/hivemind/transcripts per spec §4.3.

See SHAPE-ROTATOR-OS-SPEC.md §3.5 and §4.3 in shape-rotator-wrld-knwldge-viz for the wire format.

Voice Tagging

VoxTerm learns and remembers speaker voices across sessions:

Record a conversation — speakers are detected as "Speaker 1", "Speaker 2", etc.
Press T to name them — type a name, press Enter
Next session, VoxTerm auto-recognizes returning speakers
The more you tag, the less you need to — the system learns over time

Press P to manage your speaker profile library (rename, delete, wipe all data).

Models

qwen3-0.6b — fast, good for most use
qwen3-1.7b — more accurate, larger
fw-small (Intel Mac default) — CPU-compatible faster-whisper backend
Whisper variants (tiny through large-v3) available via M menu

Models download automatically on first use.

Project Structure

audio/              Capture, VAD, transcription, diarization, speaker profiles
network/            P2P: discovery, sessions, party mode
tui/                App, widgets, theme
tests/              Test suite
docs/               Design docs and specs
config.py           Constants, paths, settings

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.github/workflows		.github/workflows
audio		audio
dev		dev
dictation		dictation
docs		docs
network		network
summarizer		summarizer
tests		tests
tui		tui
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
diagnostics.py		diagnostics.py
flake.lock		flake.lock
flake.nix		flake.nix
install.sh		install.sh
pyproject.toml		pyproject.toml
shell.nix		shell.nix
voxterm		voxterm
voxterm.bat		voxterm.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VOXTERM

Privacy & Storage Policy

Install

Controls

Party Mode (P2P)

Hivemind mode (transcript streaming to swf-node)

Voice Tagging

Models

Project Structure

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VOXTERM

Privacy & Storage Policy

Install

Controls

Party Mode (P2P)

Hivemind mode (transcript streaming to swf-node)

Voice Tagging

Models

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages