Releases: michaelbeijer/WordCounter
v0.5.0 — About dialog with version info, author details, and repo link
What's new
- About dialog — accessible via the new "About" button in the toolbar. Shows version info, clickable link to the author's website (michaelbeijer.co.uk), and a link to the GitHub repository.
- Version in title bar — the window title now displays the version number (e.g. "WordCounter v0.5.0, by Michael Beijer").
Download
Download WordCounter-v0.5.0-win64.zip, extract anywhere, and run WordCounter.exe. No Python, Java, or other dependencies needed — everything is bundled.
Full changelog: https://github.com/michaelbeijer/WordCounter/blob/main/CHANGELOG.md
v0.4.0 — Accurate word counts for SDLXLIFF, XLIFF, TMX, and PO files
What's new in v0.4.0
Dedicated parsers for translation file formats — SDLXLIFF, XLIFF, memoQ XLIFF, TMX, and PO/POT files now use specialized XML parsers instead of Tika's raw text extraction.
The problem this solves
Previously, Tika extracted all content from SDLXLIFF files — including source text, target text, base64 hashes, timestamps, usernames, AI engine metadata, and segment IDs. This inflated word counts by 3–4x (e.g., 26,157 words instead of the actual ~7,200).
How it works now
- SDLXLIFF/XLIFF: Parses the XML structure and extracts only
<seg-source>/<source>segments (or<target>when toggled) - TMX: Extracts source-language
<seg>elements, auto-detecting the source language from the header - PO/POT: Extracts
msgidentries (source strings) - Inline XLIFF tags (
<bpt>,<ept>,<ph>,<x/>) are stripped; text inside formatting tags (<g>) is preserved - Source/target toggle in the UI: "XLIFF/SDLXLIFF: count target segments" — defaults to source
- Language auto-detection: The Note column shows e.g. "SDLXLIFF source [nl-BE]" or "SDLXLIFF target [en-US]"
- Translation formats now work without Tika — uses Python's built-in XML parser
Test results (real-world SDLXLIFF)
| Method | Words |
|---|---|
| v0.3.0 (Tika raw) | 26,157 |
| v0.4.0 source (nl-BE) | 6,652 |
| v0.4.0 target (en-US) | 7,276 |
| Reference Word doc | 7,203 |
Download
- Windows 64-bit:
WordCounter-v0.4.0-win64.zip— extract anywhere, runWordCounter.exe - No dependencies required (JRE + Tika bundled)
v0.3.0 — Bundled Tika + JRE (50+ file formats, zero dependencies)
What's new in v0.3.0
Tika + JRE bundled in the EXE — WordCounter now ships with a built-in Java Runtime and Apache Tika server, giving you 50+ file format support out of the box. No need to install Java or any other dependencies.
Supported formats (with bundled Tika)
In addition to the 4 core formats (.docx, .pptx, .xlsx, .pdf), WordCounter now handles:
- Legacy Office: .doc, .xls, .ppt
- OpenDocument: .odt, .ods, .odp
- Rich text & markup: .rtf, .html, .xml, .txt, .csv, .tsv
- Translation formats: .xliff, .xlf, .tmx, .po, .sdlxliff
- E-books & subtitles: .epub, .srt
- And many more: .mhtml, .msg, .eml, .json, .yaml, .tex, .log, .ini, .properties, images (with OCR if Tesseract is installed), and others
Distribution
- Windows 64-bit: Download
WordCounter-v0.3.0-win64.zip, extract anywhere, and runWordCounter.exe - Total size: ~218 MB uncompressed (~152 MB zipped)
- No Java, Python, or other dependencies required
Other changes since v0.2.0
- Build switched from single-file to directory mode for faster startup (no multi-second extraction delay)
- Added
.gitignorefor clean repository - Bundled JRE created with
jlink --strip-debug --compress=2(79 MB stripped from full JDK 17)
Build from source
Requires Python 3.10+, PyInstaller, and a JDK 17+ for creating the JRE bundle:
pip install python-docx python-pptx openpyxl pdfminer.six tika pyinstaller
python -m PyInstaller WordCounter.spec --noconfirmWordCounter v0.1.1
WordCounter v0.1.1
Fixed
- Resolved Windows EXE startup crash (
ame 'APP_NAME' is not defined).
Changed
- Added MIT license.
- Bumped project/app version to 0.1.1.
- Embedded Windows EXE version metadata (File/Product version: 0.1.1.0).
Assets
- WordCounter.exe (Windows)