Skip to content

Releases: michaelbeijer/WordCounter

v0.5.0 — About dialog with version info, author details, and repo link

06 Mar 15:12

Choose a tag to compare

What's new

  • About dialog — accessible via the new "About" button in the toolbar. Shows version info, clickable link to the author's website (michaelbeijer.co.uk), and a link to the GitHub repository.
  • Version in title bar — the window title now displays the version number (e.g. "WordCounter v0.5.0, by Michael Beijer").

Download

Download WordCounter-v0.5.0-win64.zip, extract anywhere, and run WordCounter.exe. No Python, Java, or other dependencies needed — everything is bundled.

Full changelog: https://github.com/michaelbeijer/WordCounter/blob/main/CHANGELOG.md

v0.4.0 — Accurate word counts for SDLXLIFF, XLIFF, TMX, and PO files

05 Mar 20:30

Choose a tag to compare

What's new in v0.4.0

Dedicated parsers for translation file formats — SDLXLIFF, XLIFF, memoQ XLIFF, TMX, and PO/POT files now use specialized XML parsers instead of Tika's raw text extraction.

The problem this solves

Previously, Tika extracted all content from SDLXLIFF files — including source text, target text, base64 hashes, timestamps, usernames, AI engine metadata, and segment IDs. This inflated word counts by 3–4x (e.g., 26,157 words instead of the actual ~7,200).

How it works now

  • SDLXLIFF/XLIFF: Parses the XML structure and extracts only <seg-source> / <source> segments (or <target> when toggled)
  • TMX: Extracts source-language <seg> elements, auto-detecting the source language from the header
  • PO/POT: Extracts msgid entries (source strings)
  • Inline XLIFF tags (<bpt>, <ept>, <ph>, <x/>) are stripped; text inside formatting tags (<g>) is preserved
  • Source/target toggle in the UI: "XLIFF/SDLXLIFF: count target segments" — defaults to source
  • Language auto-detection: The Note column shows e.g. "SDLXLIFF source [nl-BE]" or "SDLXLIFF target [en-US]"
  • Translation formats now work without Tika — uses Python's built-in XML parser

Test results (real-world SDLXLIFF)

Method Words
v0.3.0 (Tika raw) 26,157
v0.4.0 source (nl-BE) 6,652
v0.4.0 target (en-US) 7,276
Reference Word doc 7,203

Download

  • Windows 64-bit: WordCounter-v0.4.0-win64.zip — extract anywhere, run WordCounter.exe
  • No dependencies required (JRE + Tika bundled)

v0.3.0 — Bundled Tika + JRE (50+ file formats, zero dependencies)

05 Mar 17:38

Choose a tag to compare

What's new in v0.3.0

Tika + JRE bundled in the EXE — WordCounter now ships with a built-in Java Runtime and Apache Tika server, giving you 50+ file format support out of the box. No need to install Java or any other dependencies.

Supported formats (with bundled Tika)

In addition to the 4 core formats (.docx, .pptx, .xlsx, .pdf), WordCounter now handles:

  • Legacy Office: .doc, .xls, .ppt
  • OpenDocument: .odt, .ods, .odp
  • Rich text & markup: .rtf, .html, .xml, .txt, .csv, .tsv
  • Translation formats: .xliff, .xlf, .tmx, .po, .sdlxliff
  • E-books & subtitles: .epub, .srt
  • And many more: .mhtml, .msg, .eml, .json, .yaml, .tex, .log, .ini, .properties, images (with OCR if Tesseract is installed), and others

Distribution

  • Windows 64-bit: Download WordCounter-v0.3.0-win64.zip, extract anywhere, and run WordCounter.exe
  • Total size: ~218 MB uncompressed (~152 MB zipped)
  • No Java, Python, or other dependencies required

Other changes since v0.2.0

  • Build switched from single-file to directory mode for faster startup (no multi-second extraction delay)
  • Added .gitignore for clean repository
  • Bundled JRE created with jlink --strip-debug --compress=2 (79 MB stripped from full JDK 17)

Build from source

Requires Python 3.10+, PyInstaller, and a JDK 17+ for creating the JRE bundle:

pip install python-docx python-pptx openpyxl pdfminer.six tika pyinstaller
python -m PyInstaller WordCounter.spec --noconfirm

WordCounter v0.1.1

05 Mar 13:43

Choose a tag to compare

WordCounter v0.1.1

Fixed

  • Resolved Windows EXE startup crash (
    ame 'APP_NAME' is not defined).

Changed

  • Added MIT license.
  • Bumped project/app version to 0.1.1.
  • Embedded Windows EXE version metadata (File/Product version: 0.1.1.0).

Assets

  • WordCounter.exe (Windows)