From 3d3dd967177e5fdf751f88119f683cf74ce71d89 Mon Sep 17 00:00:00 2001 From: Matthew J Mucklo Date: Tue, 14 Apr 2026 02:26:53 -0700 Subject: [PATCH] Roadmap: add "Beyond v3.x" section with four strategic directions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the design space past the Path B package split so future decisions sit against an explicit menu rather than getting invented ad hoc: - §9 CLDR plural categories — lifts the English-binary assumption, rides on ext-intl / Unicode CLDR. One new method, locales delegate category resolution to MessageFormatter/NumberFormatter. - §10 Morphology expansion — verb conjugation, indefinite articles, ordinals, case/gender. Scope creep; would change the product's identity. - §11 Locale data quality — test corpora (Wiktionary/UniMorph) with CI accuracy metrics; optional ML fallback via ONNX/FFI. - §12 Ecosystem — Symfony/Laravel bridges, composer-plugin locale discovery, benchmark-as-identity against Doctrine/Symfony. Headline recommendation: §9 if we pick one — scoped, ext-intl-based, doesn't change the library's identity but makes the current product genuinely multilingual. Explicitly framed as "not commitments — captured so the decision space is explicit when we get there." Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index b93e3be..4951505 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -229,3 +229,62 @@ Deferred (separate PRs): - **v2.1** — items 5 + 5a landed together (the extension API lives on `Locale`; splitting them would be churn). Adds the instance API, `Locale` abstract class, `En` as first-party locale, and proxy extension methods on `Inflect`. Additive, no breaking changes. - **v2.2** — at least one non-English locale (candidate: `Es`, `Fr`, or `De`) as proof the `Locale` contract holds for non-trivial morphology. Possibly `infection` + `phpbench` tooling if not done earlier. - **v3.x (conditional)** — Path B: extract `Inflect\Locale\*` into `mmucklo/inflections` as a sibling package. Triggers are listed in §5a. + +## Beyond v3.x — strategic directions + +Four themes for where the library could go once v3.x stabilizes. Not commitments — captured here so the decision space is explicit when we get there. Ranked roughly by how much each would change the library's identity. + +### 9. Cross the binary-plural ceiling (CLDR plural categories) + +The current API assumes **singular / plural is a binary**. It isn't in half the world's languages: + +- Russian uses three countable forms (`1 книга`, `2 книги`, `5 книг`). +- Welsh uses four. Arabic uses six. Polish, Romanian, Lithuanian each have their own category rules. +- Unicode's [Common Locale Data Repository](https://cldr.unicode.org/) defines six categories (`zero`, `one`, `two`, `few`, `many`, `other`). PHP already ships `ext-intl` with CLDR plural rules built in. + +**Concrete API direction:** + +```php +// Today +Inflect::pluralizeIf(5, 'book'); // '5 books' — English-only assumption + +// Future +$inflect->pluralForm(5, [ + 'one' => 'book', + 'few' => 'books', + 'many' => 'books', + 'other' => 'books', +]); // returns 'books'; locale-aware category lookup +``` + +Locales delegate category resolution to `ext-intl` (which consults CLDR) rather than hand-maintained regex tables. This is the single move that turns the library from "English inflector with locale hooks" into a genuinely multilingual tool, while converging with a maintained external standard. + +**Scope:** one new method, one category enum, locale implementations that call `MessageFormatter::formatMessage` or `NumberFormatter` for the category. + +### 10. Expand from nouns to morphology + +Today the library handles noun singular ↔ plural (plus `pluralizeIf` cosmetic prefixing). A full morphological toolkit would add: + +- **Verb conjugation** — `conjugate('run', tense: 'past') === 'ran'`. Rails has this surface; irregular-verb tables are ~200 words per language. +- **Indefinite articles** — `indefiniteArticle('apple') === 'an'`. Locale-specific (French `à le → au`). +- **Ordinals** — `ordinalize(3) === '3rd'`. Bounded, per-locale. +- **Case / gender agreement** — required for real German / Slavic support. API needs `$gender`, `$case` parameters. Big cognitive-load bump; may not be worth it if the target audience is Rails-refugees rather than NLP users. + +Where this theme stops determines whether the library stays a "small useful utility" or becomes a "morphological toolkit." Both are legitimate products; they attract different users. + +### 11. Locale data quality + +Regex-rule inflectors lose on unseen words — loanwords, coinages, compounds. Two ways to push the accuracy ceiling: + +- **Test corpora per locale.** Ship `(lemma, form, features)` triples from a known-good source ([Wiktionary](https://en.wiktionary.org/) dumps, [UniMorph](https://unimorph.github.io/)) and run the inflector against them in CI with an accuracy metric. Rule additions become measurable — "this regex lifts English noun accuracy from 92.3% → 94.1% on UniMorph v1.2." Turns inflection from folklore into engineering. +- **Offline ML fallback.** When regex rules don't match, fall back to a small byte-level seq2seq model via ONNX Runtime + FFI. Heavy dependency story; probably a separate opt-in `mmucklo/inflect-neural` package. The accuracy ceiling jumps, at the cost of a binary artifact. + +### 12. Ecosystem moves (zero new features, large adoption impact) + +- **Symfony / Laravel bridges** — first-party integration packages (`mmucklo/inflect-bundle`, `mmucklo/inflect-laravel`) that register the inflector in each framework's service container with one `composer require`. Biggest adoption lift per hour of work — both frameworks ship their own inflectors today and users would otherwise have to wire Inflect in manually. +- **Composer-plugin locale discovery** — third-party locale packages (`someone/inflect-pl`, `acme/inflect-fr-quebec`) auto-register on install through a [composer-plugin](https://getcomposer.org/doc/articles/plugins.md). Adding a locale becomes a one-liner for consumers. +- **Benchmark-as-identity** — this library's pitch is "*memoizing* inflector." Publish concrete numbers (via `phpbench`, roadmap §7) vs Doctrine Inflector and Symfony String on the README, committed to never regressing them. Makes the performance claim verifiable instead of rhetorical. + +### Headline recommendation + +If we pick only one of these four: **§9 (CLDR plural categories).** Scoped, rides on a maintained external standard (Unicode CLDR), doesn't change the library's identity — but lifts its ceiling from "English-ish" to "genuinely multilingual." The other themes turn Inflect into a different product; §9 makes the current product complete.