Skip to content

jquast/ucs-detect

Repository files navigation

ucs-detect

This package provides two command-line tools for testing and inspecting Unicode support in terminal emulators.

Installation

To install or upgrade:

$ pip install -U ucs-detect

Problem

Unicode contains East Asian languages which use Wide (W) or Fullwidth (F) characters that occupy 2 cells. Many languages use zero-width or "combining" characters that modify adjacent characters with complex advancing rules. Emoji sequences also use Zero Width Joiners to join multiple emojis, or with adjusting Fitzpatrick variations, and emoji flags are represented by regional indicators of a country code. They may be also displayed without combined emoji and have a "standalone" representation.

Terminal applications must determine the display width of these characters, the Unicode Standard is without specific definitions for terminals, many stand behind without any grapheme support at all, conforming to pre-emoji era POSIX standard definitions and sans-grapheme wcwidth(3) system libraries.

Further, even well-meaning terminals who report to support "Graphemes" DEC Private Mode 2027 have varying interpretations of Unicode Standards.

Solution

ucs-detect measures terminal compliance with the Specification of the python wcwidth library, for the latest Unicode versions across WIDE, ZERO, ZWJ, VS-16, and VS-15 unicode sequences and grapheme width of over 500 languages.

ucs-browser allows interactive browsing of each kind of category with an interactive terminal browsing program. This may also output to a non-tty, and is used to publish the example test files at https://github.com/jquast/ucs-detect/tree/master/docs/ucs_example_files

How it works

ucs-detect uses the Query Cursor Position terminal sequence to ask "where is the cursor?" after printing test characters. By comparing the reported cursor position against the wcwidth expected width, compliance is measured.

This technique is inspired by resize(1), which determines terminal dimensions over transports like serial lines by moving to (999, 999) and querying cursor position.

ucs-detect

video demonstration of running ucs-detect

ucs-detect is the primary testing tool. It tests a terminal emulator's Unicode support for Wide characters, Emoji Zero Width Joiner (ZWJ) sequences, Regional Indicators and flags, Variation Selector-16 (VS-16) and VS-15 sequences, and zero-width combining characters across hundreds of languages.

Terminal features that may be automatically detected are also reported: Bracketed Paste, Synchronized Output, Mouse SGR, Grapheme Clustering, Kitty Keyboard protocol, Sixel, ReGIS, Kitty or iTerm2 image protocol, and XTGETTCAP support.

Run a default test:

$ ucs-detect

Run a detailed test and save a YAML report:

$ ucs-detect --save-yaml=data/my-terminal.yaml

Notable CLI options:

--rerun <yaml-file>
Re-test a terminal using parameters from a previous YAML report.
--test-only <category>
Test a single category: wide, zwj, vs16, vs15, lang, unicode, terminal, sri, sfz, ri, or all (default).
--limit-category-time <seconds>
Time budget per test category, auto-adjusts sampling (0=unlimited).
--stop-at-error <pattern>
Pause on errors matching pattern for interactive investigation. Values: all, zwj, wide, sri, sfz, ri, vs16, vs16n, vs15, lang, or a specific language name (e.g., Hindi).
--probe-silently
Minimal output, modifying only a single line.
--save-json <path>
Save results as a JSON report.
--no-terminal-test
Skip terminal feature detection.
--no-languages-test
Skip language support testing.

ucs-browser

video demonstration of running ucs-detect

ucs-browser is an interactive terminal browser for visually inspecting unicode character width rendering. It displays characters with pipe (|) alignment markers that should align correctly in any terminal with proper Unicode support.

$ ucs-browser

Modes are toggled with keyboard shortcuts:

  • 0: Reset to default (wide characters)
  • 1 / 2: Narrow (1-cell) or Wide (2-cell) characters
  • c: Combining characters
  • g: Grapheme clusters ([ / ] to adjust width)
  • z: Emoji ZWJ sequences
  • 5: VS-15 (text style)
  • 6: VS-16 space kludge
  • 7: VS-16 (emoji style)
  • w: Toggle with/without variation selector
  • U: Toggle uncommon CJK extensions
  • v: Select Unicode version
  • - / +: Adjust name column width

Modes may also be directly entered by CLI options (see ucs-browser --help)

Navigation follows less(1) conventions: j/k for lines, f/b for pages, q to quit.

Example files are created using ucs-browser, and are published in the source repository at url https://github.com/jquast/ucs-detect/tree/master/docs/ucs_example_files

Test Results

Results for over 30 terminals on Linux, Mac, and Windows are published at https://ucs-detect.readthedocs.io/results.html

Individual YAML reports are in the data folder: https://github.com/jquast/ucs-detect/tree/master/data

Related articles:

Updating Results

Results are shared with terminal emulator projects and may become outdated as they improve Unicode support. Submit a pull request to update YAML data files.

Re-test an existing terminal:

$ ucs-detect --rerun data/contour.yaml

This re-executes with the same parameters, overwriting the existing YAML file.

Submit results for a new terminal:

$ ucs-detect --save-yaml=data/jeffs-own-terminal.yaml --limit-category-time=900

The --limit-category-time argument is used to automatically reduce test size to attempt to complete each category under a reasonable time. This automatically adjusts the --limit-codepoints-wide-pct parameter as low as 1%.

To preview documentation changes, create a draft pull request. A readthedocs.org build status will appear -- click "Details" for an HTML preview.

Batch Testing

The general workflow to gather results and create documentation is, in combined serial and parallel order:

tox -e docker_build,docker_verify,docker_run_series,docker_screenshots &
tox -e system_verify,system_run_series,system_screenshots
wait
tox -e docs

For reproducible isolated runs, the project provides a Docker image with Xvfb and all linux terminal emulators pre-installed. All Docker operations are managed through tox targets:

# one-time buildx builder setup
tox -e docker_buildx_setup

# build the image (with cache)
tox -e docker_build

# verify all terminals installed (group --version check)
tox -e docker_verify

# run ucs-detect on all terminals
tox -e docker_run_series

# this accepts extra 'run-series.py' arguments,
tox -e docker_run_series -- --timeout 600 --run-only "foot,kitty"

# generate screenshots
tox -e docker_screenshots

Unfortunately, many terminals have to be excluded from docker:

  • Not Linux or not X11 compatible
  • GPU-accelerated and not compatible with Xvfb,
  • massive number of build dependencies
  • JS/Electron stuff (chromium?) for some reason.
  • Cannot reliably set geometry
  • Tests with ucs-detect fine, but cannot screenshot for any reason

This requires installing those terminals on the developer's host system.

Use the 'system' targets to run these:

# verify all terminals installed (group --version check)
tox -e system_verify

# run ucs-detect on all terminals
tox -e system_run_series

# generate screenshots
tox -e system_screenshots

The script run-series.py is an X11 automation for testing all linux terminals. When -e program [arguments] or similar is not supported, keystrokes are injected into the target application to launch ucs-detect by configuration.

Problem Analysis

Use --stop-at-error to investigate discrepancies interactively:

$ ucs-detect --stop-at-error 'Hindi'

Example output:

Failure in language 'Hindi' (Hindi-2-01):
+---+-----------+--------+----------+---------+-------------------------+
| # | Codepoint | Python | Category | wcwidth |           Name          |
+---+-----------+--------+----------+---------+-------------------------+
| 1 |   U+0915  | \u0915 |    Lo    |    1    |   DEVANAGARI LETTER KA  |
| 2 |   U+094D  | \u094d |    Mn    |    0    |  DEVANAGARI SIGN VIRAMA |
| 3 |   U+0928  | \u0928 |    Lo    |    1    |   DEVANAGARI LETTER NA  |
| 4 |   U+093F  | \u093f |    Mc    |    0    | DEVANAGARI VOWEL SIGN I |
+---+-----------+--------+----------+---------+-------------------------+
+----+
| क्नि |
+----+

measured by terminal: 3
measured by wcwidth:  2

Shell
-----
printf '\xe0\xa4\x95\xe0\xa5\x8d\xe0\xa4\xa8\xe0\xa4\xbf\n'

Python
------
python -c "print('\u0915\u094d\u0928\u093f')"

press return for next error, or n for non-stop:

UDHR Data

Language testing uses the Universal Declaration of Human Rights (UDHR) dataset, translated into 500+ languages, as a test corpus for zero-width characters (Mn — Nonspacing Mark), combining characters (Mc — Spacing Mark), and language-specific scripts.

Source data: https://github.com/eric-muller/udhr/

The UDHR provides practical coverage of common complex grapheme clusters across the world's languages, serving as an indicator of a terminal's support for combining marks across diverse scripts.

History

  • 2.2.0 (2026-05-29): Enrich source repository with more tools, like make-screenshots.py, expanded XTGETTCAP detection and results table, record XTVERSION, TERM_PROGRAM directly. Record cpu and memory resource usage, and introduce run-series.py, used with or without docker for batch testing with parallel execution (Linux only).
  • 2.1.0 (2026-04-17): Add more testing for standalone and flags (RI), kitty text sizing protocol, make ucs-browser compatible with older python versions, and some changes to allow ucs-detect to integrate as a network service (see telnet modem.xyz)
  • 2.0.2 (2026-02-28): Some timing bugfixes caused features to sometimes report "No" support, and provide major speed enhancement to terminal test with integration of latest blessed release.
  • 2.0.1 (2026-02-05): Add --probe-silently mode, --save-json, time measurements with RTT and ping-like statistics, and telnetlib3 shell support for testing over telnet. Bugfix iTerm2 image feature detection.
  • 2.0.0 (2026-02-01): More correct results with up-to-date wcwidth, loads of new CLI options like --rerun, --limit-category-time and remove CLI arguments --unicode-version, --shell, --quick, and --no-emit-osc1337. The wcwidth-browser program has been migrated from wcwidth, and setup.py was migrated to pyproject.toml. Requires Python 3.8.
  • 1.0.8 (2025-11-02): Added detection of DEC Private Modes, testing of Variation Selector 15, Sixel graphics and pixel size, and automatic software version (XTVERSION and ^E answerback).
  • 1.0.7 (2024-01-06): Add python 3.10 compatibility for yaml file save and update wcwidth requirement to 0.2.13.
  • 1.0.6 (2023-12-15): Distribution fix for UDHR data and bugfix for python 3.8 through 3.11. ucs-detect Welcomes @GalaxySnail as a new project contributor.
  • 1.0.5 (2023-11-13): Set minimum wcwidth release version requirement.
  • 1.0.4 (2023-11-13): Add support for Emoji with VS-16 and more complete testing. Published test results.
  • 1.0.3 (2023-10-28): Drop python 2 support. Add more advanced testing. Changes default behavior when called without arguments, use ucs-detect --quick --shell to use the new release with matching previous release behavior.
  • 0.0.4 (2020-06-20): Initial releases and bugfixes

About

Interactive Unicode Terminal Test and Reporting utility

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors

Languages