Supertonic TTS — Web App + CLI

A clean, beginner-friendly text-to-speech project built on Supertonic 3. Two ways to use it:

Web app — Three ready-made UI languages (English, Korean, Japanese), six preset voices with one-tap preview, paste-or-upload input (.txt / .docx), instant WAV download. Runs entirely in your browser via WebGPU/WebAssembly.
CLI — supertonic-tts "hello" from any terminal on macOS, Windows, or Linux. Installed globally with npm, native ONNX runtime, no GPU required.

No accounts, no API keys, no cloud round-trips.

Features

3 UI languages: English, Korean, Japanese
32 TTS language tags available in the underlying Supertonic text processor
6 voice styles with click-to-preview
Paste or upload: drop in .txt or .docx
Sample text presets per language
One-tap "Speak" with autoplay + transcript view
WAV download of any generated audio
WebGPU acceleration with automatic WASM fallback
Fully local: text never leaves the browser

Supported TTS options

The app has two language layers:

Current UI choices: English (en), Korean (ko), Japanese (ja). These are the languages with ready-made sample text, preview text, and UI tabs in app/main.js.
Underlying Supertonic language tags: en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi, na. These are accepted by the text processor in app/helper.js.

To expose another language in the UI, add an entry to LANGS in app/main.js with preview and preset text, then add or render the matching language tab.

Voice styles

Every voice style can be used with every supported TTS language tag:

ID	Display name	Type	Style file
`F1`	Mina	Female	`voice_styles/F1.json`
`F2`	Sora	Female	`voice_styles/F2.json`
`F3`	Yuna	Female	`voice_styles/F3.json`
`M1`	Aiden	Male	`voice_styles/M1.json`
`M2`	Hiro	Male	`voice_styles/M2.json`
`M3`	Leo	Male	`voice_styles/M3.json`

F1 / Mina is the default voice. Voice styles are downloaded from Supertone/supertonic-3 and loaded on demand from assets/voice_styles/ in development, or from the Hugging Face CDN in production.

Model/runtime options

TTS model family: Supertonic 3 from Supertone/supertonic-3.
ONNX model files: duration_predictor.onnx, text_encoder.onnx, vector_estimator.onnx, vocoder.onnx.
Runtime: WebGPU first, then WebAssembly fallback.
Generation controls: quality steps from 4 to 16, and speed from 0.7 to 1.8.
Output: mono 44.1 kHz, 16-bit PCM WAV generated locally in the browser.

CLI

A standalone Node CLI ships in this package. Install once and run from any directory. Two equivalent commands are exposed: short (supertts) and full (supertonic-tts).

# global install — Windows, macOS, Linux
npm install -g supertonic-tts

# simplest form — positional text, auto-detects KO/JA/EN
supertts "Hello from Supertonic!"
supertts "안녕하세요"
supertts "こんにちは" --voice M1

# explicit flags
supertts -t "Hi there" -o hi.wav --voice F2
supertts -f input.txt --lang ko -o out.wav
echo "piped text" | supertts -o piped.wav

On the first synth, model assets (~380 MB) are auto-downloaded from Hugging Face into a platform-appropriate user cache:

Platform	Default assets directory
Windows	`%LOCALAPPDATA%\supertonic-tts\assets`
macOS	`~/Library/Caches/supertonic-tts/assets`
Linux	`$XDG_CACHE_HOME/supertonic-tts/assets` (or `~/.cache/...`)

Override with --assets <dir> or the SUPERTONIC_ASSETS env var. Pre-fetch without synthesizing via supertonic-tts --download.

CLI flags

Flag	Default	Description
`-t, --text <s>`	—	inline text
`-f, --file <p>`	—	read text from a `.txt` file
`-o, --out <p>`	`./out-<timestamp>.wav`	output WAV path
`-l, --lang <c>`	auto	language tag (auto-detects ko/ja/en; see `--list-langs`)
`-v, --voice <id>`	`F1`	voice id: `F1`–`F3`, `M1`–`M3`
`-s, --speed <n>`	`1.05`	0.7 – 1.8
`--steps <n>`	`8`	quality steps 4 – 16
`--silence <s>`	`0.3`	inter-chunk pause (sec)
`--assets <dir>`	auto	override assets directory
`--download`	—	only fetch / verify assets
`--no-play`	—	don't auto-play the generated WAV
`--list-voices`	—	print voice catalog
`--list-langs`	—	print supported language tags
`-q, --quiet`	—	suppress progress logs
`-h, --help`	—	show help

By default the generated WAV plays back immediately (macOS afplay, Windows Media.SoundPlayer, Linux paplay/aplay/play/ ffplay). Playback is blocking — the command returns once the audio has finished. Pass --no-play for batch / scripted usage.

The CLI prints the output path on stdout (one line, easy to pipe). All progress / status messages go to stderr.

# capture the output path without playback
OUT=$(supertts "audio test" --quiet --no-play)
echo "wrote $OUT"

Web app quick start

Requires Node.js 18+ only. Model assets (~380 MB) are streamed directly from Hugging Face — no git-lfs needed.

# Install + auto-download the model assets
npm install

# Start the dev server (opens http://localhost:3000)
npm run dev

If the asset download was interrupted, just re-run it; existing files are skipped automatically:

npm run assets

Production build

npm run build     # outputs to ./dist
npm start         # serves ./dist on http://localhost:3000

In production builds, the app fetches model weights directly from the Hugging Face CDN at runtime (huggingface.co/Supertone/supertonic-3), so deployments don't have to ship the 380 MB of .onnx files. The CDN sets proper CORS headers and long cache lifetimes.

Deploying

GitHub Pages (zero-config)

A workflow at .github/workflows/deploy.yml builds and publishes on every push to main.

Push the repo to GitHub
In repo settings → Pages → Build and deployment → Source: GitHub Actions
Push to main (or trigger the workflow manually)
App is live at https://<user>.github.io/<repo>/

The workflow sets VITE_BASE=/<repo>/ so all relative URLs resolve under the subpath. No model files are uploaded to Pages.

Vercel

vercel --prod

vercel.json is already configured with:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless (enables faster multi-threaded WASM where supported)
Long-cache headers for /assets/*
.vercelignore excludes the local assets/ directory from upload

Self-hosting

npm run build emits a fully static ./dist directory — serve it with any static host (nginx, Caddy, Cloudflare Pages, S3 + CloudFront, etc.). If you also want multi-threaded WASM acceleration, send these response headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless

Project layout

.
├── app/                  # Vite project root (the web app)
│   ├── index.html
│   ├── main.js           # UI + synthesis orchestration
│   ├── helper.js         # Supertonic ONNX runtime helpers
│   └── style.css
├── assets/               # Model weights & voice styles (downloaded)
│   ├── onnx/*.onnx
│   ├── onnx/tts.json
│   ├── onnx/unicode_indexer.json
│   └── voice_styles/*.json
├── scripts/
│   └── download-assets.mjs
├── vite.config.js
└── package.json

How it works

The browser loads four ONNX models (duration predictor, text encoder, vector estimator, vocoder) and a voice style tensor.
Your text is preprocessed (NFKD-normalised, emoji-stripped, wrapped with the language tag) and converted to token IDs.
A short diffusion loop denoises a latent audio representation.
The vocoder synthesises 44.1 kHz, 16-bit PCM. The WAV file is built client-side and offered for playback / download.

Every step runs locally — your text and the generated audio never leave the device.

Troubleshooting

"Loading model" stays forever: open DevTools → Network. If the model files (.onnx) 404, run npm run assets again.
WebGPU disabled: only modern Chrome / Edge / Safari Tech Preview support WebGPU. The app silently falls back to WebAssembly — slower but works everywhere.
DOCX upload fails: complex DOCX files with embedded objects may not parse cleanly. Save as plain .txt as a fallback.
Korean / Japanese sound rushed: drop "Speed" in Advanced options to ~0.95.

License

App code: MIT. Supertonic model weights are subject to Supertone's license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supertonic TTS — Web App + CLI

Features

Supported TTS options

Voice styles

Model/runtime options

CLI

CLI flags

Web app quick start

Production build

Deploying

GitHub Pages (zero-config)

Vercel

Self-hosting

Project layout

How it works

Troubleshooting

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
app		app
bin		bin
cli		cli
scripts		scripts
skills/supertts		skills/supertts
.gitignore		.gitignore
.vercelignore		.vercelignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
vercel.json		vercel.json
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation

Supertonic TTS — Web App + CLI

Features

Supported TTS options

Voice styles

Model/runtime options

CLI

CLI flags

Web app quick start

Production build

Deploying

GitHub Pages (zero-config)

Vercel

Self-hosting

Project layout

How it works

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages