Releases · michaelbeijer/WordCounter

06 Mar 15:12

michaelbeijer

v0.5.0

d7e0b1e

v0.5.0 — About dialog with version info, author details, and repo link Latest

Latest

What's new

About dialog — accessible via the new "About" button in the toolbar. Shows version info, clickable link to the author's website (michaelbeijer.co.uk), and a link to the GitHub repository.
Version in title bar — the window title now displays the version number (e.g. "WordCounter v0.5.0, by Michael Beijer").

Download

Download WordCounter-v0.5.0-win64.zip, extract anywhere, and run WordCounter.exe. No Python, Java, or other dependencies needed — everything is bundled.

Full changelog: https://github.com/michaelbeijer/WordCounter/blob/main/CHANGELOG.md

Assets 3

05 Mar 20:30

michaelbeijer

v0.4.0

22649d2

v0.4.0 — Accurate word counts for SDLXLIFF, XLIFF, TMX, and PO files

What's new in v0.4.0

Dedicated parsers for translation file formats — SDLXLIFF, XLIFF, memoQ XLIFF, TMX, and PO/POT files now use specialized XML parsers instead of Tika's raw text extraction.

The problem this solves

Previously, Tika extracted all content from SDLXLIFF files — including source text, target text, base64 hashes, timestamps, usernames, AI engine metadata, and segment IDs. This inflated word counts by 3–4x (e.g., 26,157 words instead of the actual ~7,200).

How it works now

SDLXLIFF/XLIFF: Parses the XML structure and extracts only <seg-source> / <source> segments (or <target> when toggled)
TMX: Extracts source-language <seg> elements, auto-detecting the source language from the header
PO/POT: Extracts msgid entries (source strings)
Inline XLIFF tags (<bpt>, <ept>, <ph>, <x/>) are stripped; text inside formatting tags (<g>) is preserved
Source/target toggle in the UI: "XLIFF/SDLXLIFF: count target segments" — defaults to source
Language auto-detection: The Note column shows e.g. "SDLXLIFF source [nl-BE]" or "SDLXLIFF target [en-US]"
Translation formats now work without Tika — uses Python's built-in XML parser

Test results (real-world SDLXLIFF)

Method	Words
v0.3.0 (Tika raw)	26,157
v0.4.0 source (nl-BE)	6,652
v0.4.0 target (en-US)	7,276
Reference Word doc	7,203

Download

Windows 64-bit: WordCounter-v0.4.0-win64.zip — extract anywhere, run WordCounter.exe
No dependencies required (JRE + Tika bundled)

Assets 3

05 Mar 17:38

michaelbeijer

v0.3.0

3c01aa0

v0.3.0 — Bundled Tika + JRE (50+ file formats, zero dependencies)

What's new in v0.3.0

Tika + JRE bundled in the EXE — WordCounter now ships with a built-in Java Runtime and Apache Tika server, giving you 50+ file format support out of the box. No need to install Java or any other dependencies.

Supported formats (with bundled Tika)

In addition to the 4 core formats (.docx, .pptx, .xlsx, .pdf), WordCounter now handles:

Legacy Office: .doc, .xls, .ppt
OpenDocument: .odt, .ods, .odp
Rich text & markup: .rtf, .html, .xml, .txt, .csv, .tsv
Translation formats: .xliff, .xlf, .tmx, .po, .sdlxliff
E-books & subtitles: .epub, .srt
And many more: .mhtml, .msg, .eml, .json, .yaml, .tex, .log, .ini, .properties, images (with OCR if Tesseract is installed), and others

Distribution

Windows 64-bit: Download WordCounter-v0.3.0-win64.zip, extract anywhere, and run WordCounter.exe
Total size: ~218 MB uncompressed (~152 MB zipped)
No Java, Python, or other dependencies required

Other changes since v0.2.0

Build switched from single-file to directory mode for faster startup (no multi-second extraction delay)
Added .gitignore for clean repository
Bundled JRE created with jlink --strip-debug --compress=2 (79 MB stripped from full JDK 17)

Build from source

Requires Python 3.10+, PyInstaller, and a JDK 17+ for creating the JRE bundle:

pip install python-docx python-pptx openpyxl pdfminer.six tika pyinstaller
python -m PyInstaller WordCounter.spec --noconfirm

Assets 3

05 Mar 13:43

michaelbeijer

v0.1.1

63c9d85

WordCounter v0.1.1

Fixed

Resolved Windows EXE startup crash (
ame 'APP_NAME' is not defined).

Changed

Added MIT license.
Bumped project/app version to 0.1.1.
Embedded Windows EXE version metadata (File/Product version: 0.1.1.0).

Assets

WordCounter.exe (Windows)

Assets 3

Uh oh!

Releases: michaelbeijer/WordCounter

v0.5.0 — About dialog with version info, author details, and repo link

What's new

Download

Uh oh!

v0.4.0 — Accurate word counts for SDLXLIFF, XLIFF, TMX, and PO files

What's new in v0.4.0

The problem this solves

How it works now

Test results (real-world SDLXLIFF)

Download

Uh oh!

v0.3.0 — Bundled Tika + JRE (50+ file formats, zero dependencies)

What's new in v0.3.0

Supported formats (with bundled Tika)

Distribution

Other changes since v0.2.0

Build from source

Uh oh!

WordCounter v0.1.1

WordCounter v0.1.1

Fixed

Changed

Assets

Uh oh!