Skip to content

kmiikki/uhexdump

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uhexdump

Python License Platform Status

A UTF-8 aware hex dump utility for modern terminals.

uhexdump extends traditional hex dump tools with Unicode awareness, visual whitespace rendering, and multiple display modes designed for debugging text/binary streams.

It is especially useful when inspecting:

  • UTF-8 encoded data
  • serial protocols
  • mixed binary/text logs
  • whitespace-sensitive formats
  • Python indentation
  • corrupted data streams


Features

  • UTF-8 aware decoding
  • Unicode Control Pictures for control characters
  • visible whitespace visualization
  • Python indentation detection
  • UTF-8 highlighting
  • grouped hex output
  • multiple display modes
  • ANSI colored output
  • pipe-friendly CLI tool

Output Modes

Classic Mode

Traditional hex dump layout.

offset  hex bytes   text column

Example:

./uhexdump.py utf8_test.txt

classic mode

Features visible:

  • UTF-8 characters rendered correctly
  • invalid sequences marked
  • control characters visualized

Dual Mode

Two aligned rows per block.

hex bytes
aligned characters

Example:

lsb_release -a | ./uhexdump.py --mode dual --show-space

dual mode

Advantages:

  • exact byte alignment
  • easy to see UTF-8 continuation bytes
  • whitespace clearly visible

Stacked Mode

Compact stacked representation.

hex row
text row

Example:

date | ./uhexdump.py -m stacked -w 42

Useful for quick stream inspection.


UTF-8 Visualization

UTF-8 sequences are decoded and displayed as a single character.

Continuation bytes are shown using filler symbols.

Example:

e2 90 8a

Displayed as:

␊ · ·

This allows easy detection of:

  • UTF-8 start bytes
  • continuation bytes
  • broken sequences

Visible Whitespace

Option:

--show-space

Spaces appear as:

Tabs appear as:

This is extremely useful for debugging whitespace issues.


Python Indentation Mode

--indent-mode python

Highlights indentation characters.

Mixed indentation (tabs + spaces) is flagged with a warning marker.

Example:

! 00000010 ...

Installation

Clone repository:

git clone https://github.com/kimmiikki/uhexdump.git
cd uhexdump

Make executable:

chmod +x uhexdump.py

Optional dependency for correct character width handling:

pip install wcwidth

Usage

uhexdump.py [options] [file]

Input sources:

  • file
  • - (stdin)
  • pipe

Examples:

cat file.bin | ./uhexdump.py
./uhexdump.py file.bin
./uhexdump.py - < file.bin

Command Line Options

Option Description
-m, --mode Output format: classic, dual, stacked
-w, --width Bytes per row
--show-space Show spaces as ␠
--indent-mode python Visualize Python indentation
--start-offset Start dumping from byte offset
--length Limit number of bytes
--color ANSI color mode (auto, always, never)
--no-text Hide text column
--group Group hex bytes into blocks
--highlight utf8 Highlight UTF-8 sequences

Examples

Dump file:

./uhexdump.py file.bin

Pipe input:

cat log.txt | ./uhexdump.py

Highlight UTF-8 sequences:

./uhexdump.py --highlight utf8 file.txt

Large rows:

./uhexdump.py -w 32

Whitespace visualization:

./uhexdump.py --show-space script.py

Python indentation debugging:

./uhexdump.py --indent-mode python script.py

Comparison

Feature hexdump xxd uhexdump
UTF-8 decoding
control pictures
whitespace visualization
indentation detection
UTF-8 highlighting

Repository Structure

uhexdump/
├─ uhexdump.py
├─ README.md
├─ LICENSE
└─ images/
   ├─ classic.png
   ├─ dual.png
   └─ combo.png

License

MIT License


Author

Kim Miikki


Future Ideas

Possible future improvements:

  • automatic terminal width detection
  • binary diff mode
  • protocol decoding helpers
  • pip install package
  • man page

About

UTF-8 aware hex dump tool for terminals. Visualizes Unicode, whitespace, control characters and UTF-8 sequences.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors