Proj/language selection#749
Conversation
… language-clause module Scaffolding for the language-preservation feature (issue #616). Adds `BrvConfig.language` and a shared clause module; no surface is wired yet. Commits 02–05 do the actual injection (tool-mode prompts, tokenizer, validation, CLI). - `BrvConfig.language?: { mode: 'auto' | 'fixed'; code? }` round-trips through toJson/fromJson and every with* method. `mode: 'fixed'` without `code` is rejected at load so silent fallback to English is structurally impossible. - `language-clause.ts` exports `buildLanguageClause(language?)` and an inline `LANGUAGE_NAMES` ISO-639-1 → English map (~24 entries, no iso-639-1 dependency). Auto / fixed-known / fixed-unknown all return a clause that carves out tag names, attribute names, enum values, and `path` from translation so Zod validation can't fire on a localized schema key. Tests: 13 new BrvConfig tests + 14 new language-clause tests. All existing tests pass; clause module has zero src/ consumers (commit 02).
…tion [ENG-2687] Language selection 1/5 — foundation (BrvConfig.language + language-clause module)
…rection prompts + MCP tool description
Wires the language-preservation clause from ENG-2687 into the three
tool-mode prompt surfaces a calling agent's LLM sees during curate. After
this commit a tool-mode user under `language: { mode: 'fixed', code: 'ru' }`
gets `<bv-*>` body text authored in Russian; schema (tag names, attribute
names, enum values, `path`) stays English because the clause carves it out.
Closes #616 for Cyrillic / Vietnamese / European Latin users. CJK users
need commit 03 (tokenizer fix) before queries find matches.
Injection points:
- `buildGeneratePrompt(options)` and `buildCorrectionPrompt(options)` —
add `language?: BrvConfigLanguage` to options; emit a `# Language`
section. In the kickoff: between Path format and Element vocabulary
(part of the byterover-controlled framing that commits BEFORE the
user-intent block). In correction: between Output contract and
Errors-to-fix (reasserts the contract on every retry).
- `brv-curate-tool.ts:TOOL_DESCRIPTION` — append the auto clause
unconditionally. The MCP tool description is built once at server-boot
and cannot read live config; per-call fixed-mode is honored via the
oclif kickoff prompt (which IS dynamic).
Threading:
- `kickoffSession` / `continueSession` in curate-session.ts now accept
`language?` and pass it to the builders.
- The oclif `brv curate` command loads `BrvConfig.language` via
`ProjectConfigStore.read()` (with a try/catch that degrades a corrupt
config to `undefined` → auto, never blocking curate) and threads it
into both kickoff and continuation.
- `language` is re-read on every continuation so a mid-session
`brv config set language.code <iso>` is honored on the next retry.
Tests (10 new):
- 4 cases for buildGeneratePrompt (section ordering, auto-default,
auto-mode, fixed-RU-emits-Russian-and-not-auto).
- 3 cases for buildCorrectionPrompt (section ordering, auto-default,
fixed-RU).
- 2 integration cases on kickoffSession (auto-default thread, fixed-RU
thread) confirming end-to-end orchestrator → builder.
- 1 case on TOOL_DESCRIPTION self-containment asserting the auto clause
+ schema-key carve-out are present.
175/175 tests green across the affected surfaces. Typecheck + lint clean;
the pre-existing `max-params` warning on `registerBrvCurateTool` is
unchanged (signature was already 5 params).
…tion [ENG-2688] Language selection 2/5 — tool-mode injection (kickoff + correction + MCP tool description)
…e / Korean queries
Fixes a confirmed CJK-search bug in `search-knowledge-service.ts`'s
BM25 index. MiniSearch 7.2.0's default tokenizer splits on `\p{Z}\p{P}`
(whitespace + punctuation). CJK scripts have no whitespace, so
`'认证系统使用JWT令牌'` tokenizes as a single token and a query for
`'认证'` returns zero matches against indexed content.
Empirical confirmation pre-fix:
const ms = new MiniSearch({fields: ['t'], idField: 'id'})
ms.addAll([{id: 1, t: '认证系统使用JWT令牌'}])
ms.search('认证') // → [] — broken
ms.search('Привет мир') // → matches as expected
Without this fix, language preservation from ENG-2687 / ENG-2688 is
invisible to CJK users: their curate output is correctly authored in
Chinese / Japanese / Korean but their queries return zero matches.
Approach:
- New `cjk-tokenizer.ts` module. Algorithm:
1. Split on Unicode whitespace + punctuation (same as MiniSearch default).
2. For each token, split at CJK ↔ non-CJK script boundaries.
3. Non-CJK segments emit as-is (Latin / Cyrillic / Vietnamese / European
behave byte-identical to the default).
4. CJK segments emit overlapping bigrams; single-char fallback to unigram.
- CJK ranges: U+4E00–9FFF (Unified Ideographs), U+3040–309F (Hiragana),
U+30A0–30FF (Katakana), U+AC00–D7AF (Hangul Syllables).
- Bigrams are the standard CJK IR compromise — unigrams are too noisy
(common chars like `的` dominate scoring), trigrams too sparse.
- Wired via the top-level `tokenize` option on `MINISEARCH_OPTIONS`. Per
the MiniSearch source (line 1564-1566), that single option applies at
both index and query time when `searchOptions.tokenize` is unset.
- No new runtime deps — `Intl.Segmenter` is available in Node 16+ but its
CJK quality varies by ICU version; inline bigrams are deterministic
across Node versions / platforms.
INDEX_SCHEMA_VERSION bumped 6 → 7 so cached indexes built with the
default tokenizer invalidate and rebuild on first daemon start.
Tests (18 new):
- 4 cases for non-CJK scripts (English, Russian, Vietnamese, punctuation)
asserting byte-identical behavior to the MiniSearch default.
- 5 cases for CJK scripts (Chinese 4-char bigrams, Chinese 2-char single
bigram, Japanese kanji+kana, Korean Hangul, single-char unigram fallback).
- 3 cases for mixed Latin+CJK tokens (whitespace-separated, no-whitespace
split, multiple alternating runs).
- 6 MiniSearch integration cases — the actual production wiring contract:
Chinese / Japanese / Korean queries return matches; Russian regression;
English regression; English query does NOT match CJK content
(cross-script isolation).
251/251 across affected surfaces (18 new tokenizer + 58 existing
search-knowledge + 175 prior PRs from ENG-2687/2688). Typecheck + lint
clean; 9 pre-existing warnings on unrelated functions unchanged.
[ENG-2689] Language selection 3/5 — CJK-aware MiniSearch tokenizer
… integration Validates the language-selection feature (ENG-2687 → ENG-2689) end-to-end across the four target non-English scripts (Russian / Vietnamese / Chinese / Japanese) plus the auto-mode default. Adds `test/integration/scenarios/language-roundtrip.test.ts` (8 cases) — walks the full pipeline a real user hits: `.brv/config.json` on disk → `ProjectConfigStore.read()` → `BrvConfig.language` → `kickoffSession()` → the kickoff prompt envelope the calling agent's LLM consumes. Proves no layer in the threading regresses for any target language. Coverage: - Auto mode (default, no language field) → auto clause present - Auto mode (explicit `mode: auto`) → auto clause present - Fixed-RU → "in Russian" in the prompt; auto clause absent - Fixed-VI → "in Vietnamese" - Fixed-ZH → "in Chinese" - Fixed-JA → "in Japanese" - Schema rejection at load: fixed mode without code throws via fromJson - Forward-compat: unmapped code (`xx`) → "in \"xx\"" fallback Out of scope and documented in the validation report (research repo, features/language-selection/validation/04-validation.md): LLM-honoring of the clause requires a real consumer (Claude Code, Cursor) and is verified manually pre-release. The auto-test harness in local-auto-test also exercises curate + query roundtrip across all four scripts via real `brv` CLI invocations — 13/13 cases green; covered in the report. Test counts after this commit: 326 total green across the feature (54 BrvConfig + 175 prompt/tool + 18 tokenizer + 58 search-knowledge regression + 8 new language-roundtrip + 13 auto-test harness).
[ENG-2690] Language selection 4/5 — validation (fixtures + cross-language E2E + English regression)
Final commit of the language-selection initiative (#616). Ships the user-facing CLI for language preference and the release notes that close the issue end-to-end. New `brv config set <key> <value>` and `brv config get <key>` commands — generic project-config infrastructure (not a one-off `brv language` command). Today's keys: `language.mode` and `language.code`; future project-config keys plug into the `SETTERS` / `GETTERS` dispatch with no new oclif surface area. CLI behavior: - `brv config set language.mode auto | fixed` — reject `fixed` when no `language.code` is set, with a redirect message pointing at `brv config set language.code <iso>`. Prevents writing a config that `isBrvConfigJson` would refuse on next load. - `brv config set language.code <iso>` — validates against the `LANGUAGE_NAMES` map (24 entries, no `iso-639-1` dependency). Unknown codes rejected with a sorted supported-list error. - `brv config get language.<key>` — symmetric reader; returns "(not set)" for absent fields, or the value (in text or JSON mode). Other: - New `BrvConfig.withLanguage(language?)` method following the existing `with*` pattern. Used by the set command's setters; previously a future caller would have to spread BrvConfig fields by hand. - Pure dispatcher functions (`applyConfigSet`, `applyConfigGet`) separated from the oclif Command class so the validation contract is testable in isolation. Release notes under [Unreleased] in CHANGELOG.md crediting Dmitriy K and including the **restoration recipe** for users who prefer the prior implicit-English behavior: brv config set language.code en brv config set language.mode fixed Tests (22 new): - 12 cases on `applyConfigSet` covering: auto / fixed / mode transitions, code update preserves mode, English restoration recipe, rejection paths (fixed-without-code, unknown mode value, unknown ISO code, unknown config key, totally unrelated key). - 5 cases on `applyConfigGet`: unset → undefined, both modes, both keys, unknown-key rejection. - 5 cases on `withLanguage`: replace, set-when-unset, clear via undefined, no mutation of original, all-other-fields-preserved. Ship gate: - Typecheck + lint clean (pre-existing complexity warning unchanged). - 242 mocha tests green across the affected surfaces (BrvConfig + config CLI + prompt builders + clause module + curate-session + brv-curate-tool + CJK tokenizer + language roundtrip integration + validate-brv-config init hook). - Auto-test harness 13/13 green after rebuild — including the 4 cross-language curate→query roundtrips and the 7 original English cases (zero structural drift). - Manual CLI smoke test: full restoration recipe roundtrip + unknown- code rejection. Post-merge action: post on #616 with feature summary, link to release notes, and pointer to backlog.md so Dmitriy can comment on what to prioritize next (per-curate --lang flag, TUI panel, per-domain overrides). Don't auto-close — let Dmitriy close after confirming the feature works for him.
[ENG-2691] Language selection 5/5 — brv config set/get + release notes + ship gate
… in SETTINGS_REGISTRY Adds EnumSettingDescriptor to the settings type system and registers the two language settings (mode, code) under category 'language'. The wire layer (CLI parser, transport DTO, validator, store, handler) now accepts and round-trips string-valued enum settings end-to-end; TUI and WebUI renderers will pick them up in follow-up commits. - SettingDescriptor union grows with EnumSettingDescriptor (options field) - SettingItem.current / default widen to boolean | number | string - SettingsItemDTO exposes 'options' on enum-typed items - brv settings set <enum-key> rejects invalid values with the allowed list - SettingsValidator.validate routes enum values through validateEnum - SETTINGS_KEYS gains LANGUAGE_MODE + LANGUAGE_CODE - registry test: enum narrowing, options shape, language category - handler test: LIST exposes options, SET rejects non-string to enum keys
Surfaces language.mode / language.code via the existing settings UIs. - shared/language/language-names.ts: extracted single source of truth for the ISO-639-1 → English-name map (was inline in language-clause.ts); reused by settings registry, WebUI display labels, and language-clause - shared SettingsRow / format-settings: adds enum row builder with options + display formatting "[ auto ]" - TUI settings-page: Left/Right cycles options in edit mode; LANGUAGE group appears below UPDATES per CATEGORY_ORDER - WebUI enum-settings-row.tsx: select dropdown driven by descriptor options; language.code rows render "ja — Japanese" via LANGUAGE_NAMES - WebUI LanguagePanel wires into Configuration > General between TaskHistoryPanel and UpdatesPanel - tests: enum row builder + parseRowInput + category ordering
…onfig set language.*
Switches the curate kickoff/continuation language read site from project
config to the daemon settings store, and points users at the new
brv settings set surface for any future change. Project config remains
a fallback for users who still have a per-project override from
ENG-2691 so no one loses their setting on upgrade.
- curate/index.ts: resolveLanguagePreference now calls FileSettingsStore
first; falls back to ProjectConfigStore for backward compat
- config/set.ts: brv config set language.{mode,code} now fails with a
clear "moved to brv settings set" message; pure dispatcher unchanged
so the validation path still covers other project-config keys
…ange + integer parsed.value Pre-push tsc rejected: - enum-settings-row: Select.onValueChange signature is (string | null) under the component lib's types; the null branch is unreachable for a non- cleared single-select but the type system needs an explicit guard. - settings-row IntegerSettingsRow: parsed.value widened to (number | string) after enum support landed; narrow to number before passing to toastValue/setMutation so the integer branch keeps compiling under strict.
The oclif settings command kept a duplicate local CATEGORY_ORDER + HEADERS that wasn't extended when 'language' was added to the canonical shared registry, so language rows silently dropped from text output even though the JSON form exposed them. Promote CATEGORY_HEADERS plus a CATEGORY_ORDER-derived toRowCategory helper into the shared types module so the oclif and shared-utils formatters consume a single canonical source. Add a regression test that iterates every category in CATEGORY_ORDER, gating the broader "new category added but normalizer not updated" bug class.
`oclif/curate/index.ts` was importing FileSettingsStore + SETTINGS_KEYS directly from server/ to resolve the language preference, violating CLAUDE.md's "oclif and tui never import from server" boundary. Route through SettingsEvents.LIST via withDaemonRetry — same pattern used by `brv settings get/list`. Add an intent comment on resolveLanguagePreference explaining the daemon-vs-project-config precedence semantics for the ENG-2691 → ENG-2974 migration window so a future reader doesn't try to "fix" the deliberate fallback. Distinguishing "user explicitly set mode=auto" from "user never touched settings" requires raw-overrides access the transport doesn't expose today, and the case can't manifest in production until project-config language ships.
The round-3 transport refactor inlined `'language.mode'` and `'language.code'` as local string-literal consts in `oclif/commands/curate/index.ts` to avoid the oclif → server boundary slip. That workaround reproduced the brittleness the original `SETTINGS_KEYS` constant was designed to prevent — a rename in the registry would no longer be a typecheck error at the curate call site (and the same trap existed in two WebUI files). Move `SETTINGS_KEYS` to `src/shared/types/settings-keys.ts` so every consumer (server registry / validator / bootstrap, agent, oclif curate, WebUI labels + enum-settings-row) imports one canonical map. Rename safety is now machine checked. WebUI consumers that previously inlined the wire keys also switch to the constant. Also export `readLanguageFromSettings` with an optional `DaemonClientOptions` parameter and add three unit tests covering: daemon returns fixed → returns the language; daemon returns auto → undefined; daemon throws → undefined. The function was the surface that just changed and silently swallows daemon errors, so direct coverage is warranted.
…ings PR Five inline-comment items on PR #724's auto-review against feat/ENG-2974: 1. Tight retry budget on readLanguageFromSettings — kickoff was inheriting withDaemonRetry's MAX_RETRIES=10 × 1s default, making every `brv curate` block ~9s when the daemon was unreachable before falling back to project config. Now passes {maxRetries:1, retryDelayMs:0, ...options} so a missing daemon trips the fallback immediately. Test asserts call count stays at most 2 (was 10 with the old default; test would have run for ~9s, now completes in 6ms). 2. Surface enum options on `brv settings get` — text mode now prints an `allowed: a, b, c` line for enum keys; JSON mode includes the `options` field. Mirrors how integer keys surface `min`/`max`/`range`. Scripted consumers can discover the legal value set without grepping source. 3. Clean stale `brv config set` surface — help text, args description, and examples no longer advertise the now-deprecated `language.mode` / `language.code` (the command-level interceptor catches those upstream with a deprecation-redirect). Examples point users at `brv settings set` instead. SETTERS dispatcher emptied with a comment that future project-config keys wire in by adding one entry; unreachable setLanguageMode / setLanguageCode functions removed; their dispatcher tests collapsed to skeleton coverage. 4. Collapse language-clause re-export — replaced the `export {LANGUAGE_NAMES} from ...` + `import {... as LANGUAGE_NAMES_LOCAL}` double-binding with a single `import {LANGUAGE_NAMES}` directly from the canonical shared module. `oclif/commands/config/set.ts` and the language-clause test repoint to the same canonical path. 5. file-settings-store — source `restartRequired` from the registry descriptor instead of hardcoding `true`. The hardcoded value was dead in the transport-routed path (the handler already overrides via the descriptor) but misrepresented boolean/enum keys whose descriptors carry `restartRequired: false` (update.checkForUpdates, language.mode, language.code) for any in-process caller reading the store directly. Also refresh the `SettingsFile.values` doc that still said "set writes only validated numeric values" — now mentions booleans and enums. Verification: 8871 npm test passing; ESLint + typecheck + build clean; 3 sub-agents PASS on settings-get / curate-pipeline / config-deprecation surfaces. Codex round-5 review APPROVED.
Feat/eng 2974
The onboarding tour now asks which written language to save the user's memory in, surfacing the ENG-2974 language settings during the guided intro instead of leaving users to discover them later. - Reuse the existing global settings (`language.code` / `language.mode`) via `brv settings set` — no new infrastructure. - Ask it as Message 1's first question, on its own turn (one question per turn), before the persona interview. Set the language before the first `brv curate` so the saved artifact renders in the chosen language. - Default path stays zero-setup: English / "match me" / blank leaves `auto` mode and runs nothing (auto already matches the user's input). - Set `language.code` first, then `language.mode fixed` — a rejected or mis-mapped code then leaves mode at its safe `auto` default rather than forcing English. Unsupported language or `invalid_value` degrades to auto; never dumps the supported list or surfaces raw error JSON. - Amend onboarding guardrails: carve the narrow config exception, extend the no-form/menu rule to the language line, and forbid asking the language and persona questions in the same message. - Add a content test asserting onboarding.md documents the language step, the code-before-mode ordering, and the trust-opener-first sequence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat: [ENG-3002] add language selection to onboarding tour
Conflict resolution:
Two files had 3-stage index entries. The full conflicted file list and
the resolution decision per file:
1. src/agent/infra/tools/implementations/search-knowledge-service.ts
— UNION resolution.
* From main (ENG-3021): `parsed.imageContent` added to the
indexedContent array so BM25 sees <img alt + src> tokens.
* From HEAD (ENG-2689): `tokenizeWithCjk` import + wired into
MINISEARCH_OPTIONS.tokenize for CJK bigram support.
* Both sides bumped INDEX_SCHEMA_VERSION to 7 — coincidence, both
changes needed an index invalidation. No behavioural overlap;
both contributions preserved verbatim.
2. src/oclif/commands/curate/index.ts — manual resolution, two regions.
Region 1 (imports) — disjoint UNION:
* HEAD added BrvConfigLanguage, ProjectConfigStore, SettingsEvents,
SETTINGS_KEYS for the language-preference resolution path; took
a narrow {continueSession, kickoffSession, resolveProjectRoot}
from curate-session.
* main added CurateSessionEnvelope and the full curate-session
import set including deleteCurateResponseFile,
InvalidResponseFileError, InvalidResponseFormatError,
loadCurateResponseFile, parseCurateResponse, peekCurateSession,
unknownSessionEnvelope.
* Resolution: merged both import-type lines + combined named-
import list.
Region 2 (handleContinuation body) — STRUCTURAL choice + graft.
The two sides described different shapes of the same function:
* HEAD: old shape — inline withDaemonRetry(continueSession(...))
with `language` passed in. No file IO, no envelope validation,
no peek-session.
* main: new shape — mutex checks, response resolution from
inline OR file via loadResponseFile, local pre-dispatch
validation (empty/format/session-peek), dispatchContinuation
helper, post-success --delete-response-file cleanup with
append-to-errors[] semantics on cleanup failure.
Decision: kept main's full structure (strictly subsumes HEAD's
old shape), then grafted language threading on top:
* resolveLanguagePreference(projectRoot) called once after the
projectRoot is resolved
* Passed language into dispatchContinuation(...)
* Widened dispatchContinuation args to accept
language?: BrvConfigLanguage
* Threaded language into the inner continueSession({...}) call
This preserves every behaviour from both sides: file IO +
validation + cleanup from main, language threading from HEAD.
handleKickoff already called kickoffSession({content, language,
projectRoot}) outside the conflict markers and merged cleanly — no
action needed there.
Lib-layer signatures (curate-session.ts) were already a superset on
main: both ContinueOptions and KickoffOptions accept
`language?: BrvConfigLanguage`, so no lib changes were needed.
|
Claude finished @danhdoan's task in 4m 49s —— View job Claude Code Review
SummaryScore: 4.5 / 5 — solid, well-tested, conservative merge. The merge-resolution is clean and additive; the language-clause threading composes on top of Strengths
Findings (all non-blocking — see inline comments)
Risk profile: low. Merge-resolution is byte-equivalent for Recommendation: ship after addressing finding #1's UX surfacing (the "I clicked auto but it's still Russian" failure mode is the most likely user-visible footgun during the migration window). Other findings can fold into follow-ups. |
| private async resolveLanguagePreference(projectRoot: string): Promise<BrvConfigLanguage | undefined> { | ||
| const fromSettings = await readLanguageFromSettings() | ||
| if (fromSettings !== undefined) return fromSettings | ||
|
|
||
| try { | ||
| const config = await new ProjectConfigStore().read(projectRoot) | ||
| return config?.language | ||
| } catch { | ||
| return undefined | ||
| } | ||
| } |
There was a problem hiding this comment.
issue (non-blocking): Precedence bug: an explicit daemon mode: 'auto' cannot override a stale per-project {mode: 'fixed', code: …} because readLanguageFromSettings returns undefined for any non-fixed daemon mode and resolveLanguagePreference then falls through to project config. This is documented in the docstring as intentional during the migration window, but the comment buries the lede — the function name suggests "no opinion" while the behavior is "no opinion only when daemon says fixed."
Worth at least logging the override edge-case for forensics (the docstring already says "revisit once project-config language is fully sunset"), and tracking the project-config sunset as a follow-up in the PR's "Out of scope" list (already noted). Confirm the WebUI Settings → Language → "auto" toggle messaging makes it clear that a project-config override may still apply mid-migration, or users will report "I clicked auto but it's still writing in Russian" until the sunset lands.
| if (descriptor.type === 'enum' && got !== 'string') { | ||
| return { | ||
| code: 'invalid_value_type', | ||
| expected: 'enum', | ||
| got, | ||
| key, | ||
| message: `expected string for '${key}', got ${got}`, | ||
| value, | ||
| } | ||
| } |
There was a problem hiding this comment.
issue (minor): Enum value membership is enforced at the store layer (validateEnum in settings-validator.ts), which surfaces as code: 'invalid_value' — not 'invalid_value_type'. That's inconsistent with integer/boolean where a runtime type mismatch yields 'invalid_value_type'. For an enum with options: ['auto', 'fixed'], calling set('language.mode', 'bogus') returns invalid_value (range-style) rather than something more semantically aligned with "not in the enum set."
Not a behavior bug — both surface in error.message — but consumers switching on code lose the ability to distinguish "string but not in options" from "actual range/coupling violations." If you ever want to enable richer UI states (e.g. "please pick from the dropdown" vs "value out of allowed range"), this is the place. Consider extending SettingsErrorDTO.code with 'invalid_enum_option' (or just route the descriptor-known-options case through the type-check helper here so it emits invalid_value_type with expected: 'enum').
…anguage surface ENG-2974 left `brv config set/get language.*` as a deprecation gate pointing at `brv settings set/get`. With the feature stack settled and no other user-settable project-config keys planned, the dispatcher itself is dead weight: `SETTERS` was empty for writes, `GETTERS` only leaked the legacy fallback (the surface asymmetry the auto-reviewer flagged on PR #749). Removed: - src/oclif/commands/config/{set,get}.ts + their tests + empty dir - BrvConfig.withLanguage() — added in ENG-2691 specifically to back applyConfigSet's language setter; no other caller after this change - describe('withLanguage', ...) test block in brv-config.test.ts (5 cases) Retained: - BrvConfig.language field + serialization. resolveLanguagePreference still reads it as the second-tier fallback for upgrade-compat with users who configured language before ENG-2974. The harness's per-project fixed-X kickoff cases still exercise this tier via setLanguageConfigDirect. - No per-project language surface added. Auto-mode (the default) already matches the user's input language each curate, so multilingual users get correct behavior without a knob. After this commit, `brv settings set/get language.mode|code` is the ONLY canonical write/read surface. `brv config` no longer exists. Verification: - lint: 0 errors (280 pre-existing warnings, none on changed files) - unit tests: 8994 passing / 0 failing - curate-tool-mode-e2e: 20 / 20 - language-selection-e2e: 20 / 20 (was 22 — deprecation-gate and config-get-unknown cases gone with the command; harness migrated separately in this branch's prior work)
|
Claude finished @danhdoan's task in 5m 21s —— View job Claude finished @danhdoan's task —— View job Claude Code Review
SummaryScore: 3.5 / 5 — solid feature work and a clean merge resolution, but one ship-blocking documentation defect and a state-trap created by the Strengths
Findings (inline comments on each)
Out-of-scope items the PR correctly defers
Risk profile: medium, driven mostly by finding #1 (user-facing docs lie about the surface). Code-level correctness is high; merge resolution is byte-equivalent for Recommendation: address finding #1 before merge — it's a 4-line CHANGELOG edit and prevents every reader from running broken commands. Finding #3 is the next-most-important since it's the "I clicked auto but it's still Russian" footgun for migrating users and the PR's own "Out of scope" doesn't fully cover it. The others fold cleanly into follow-ups. |
| - **ByteRover preserves your input language by default.** When you curate context in Russian, Chinese, Japanese, Vietnamese, or any other language, the calling agent's LLM is now instructed to author body text in the same language (the schema — tag names, attribute names, enum values, paths — stays English so tooling is unaffected). Configure with the new `brv config set` command: | ||
| - `brv config set language.mode auto` — match the user's input language (default). | ||
| - `brv config set language.mode fixed` + `brv config set language.code <iso>` — force a specific language. ISO 639-1 codes accepted: `ar`, `de`, `el`, `en`, `es`, `fi`, `fr`, `he`, `hi`, `id`, `it`, `ja`, `ko`, `nl`, `no`, `pl`, `pt`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`, `zh`. | ||
| - `brv config get language.mode` / `brv config get language.code` — read back the current setting. | ||
|
|
||
| CJK queries (Chinese, Japanese, Korean) are now searchable in BM25 — the tokenizer was previously whitespace-only and treated entire CJK sentences as one token. **Restoration recipe** for users who prefer the prior implicit-English behavior: `brv config set language.code en` then `brv config set language.mode fixed`. Reported by Dmitriy K — thanks for the thorough reproduction in [#616](https://github.com/campfirein/byterover-cli/issues/616). |
There was a problem hiding this comment.
issue (blocking — user-facing): The release notes describe brv config set/get language.* as the public surface, but commit d7f3bf6 ("remove brv config CLI — settings is the only language surface") deleted the entire brv config command in this same PR. A user who reads these notes and runs brv config set language.mode auto will hit Command not found — and so will the restoration recipe at the bottom (brv config set language.code en then brv config set language.mode fixed), which is the headline "don't lose your prior behavior" path for the people most likely to care.
The canonical surface is now brv settings set language.mode|code (and brv settings get … to read back). The supported-codes list and the restoration recipe also need updating.
Suggested edit:
| - **ByteRover preserves your input language by default.** When you curate context in Russian, Chinese, Japanese, Vietnamese, or any other language, the calling agent's LLM is now instructed to author body text in the same language (the schema — tag names, attribute names, enum values, paths — stays English so tooling is unaffected). Configure with the new `brv config set` command: | |
| - `brv config set language.mode auto` — match the user's input language (default). | |
| - `brv config set language.mode fixed` + `brv config set language.code <iso>` — force a specific language. ISO 639-1 codes accepted: `ar`, `de`, `el`, `en`, `es`, `fi`, `fr`, `he`, `hi`, `id`, `it`, `ja`, `ko`, `nl`, `no`, `pl`, `pt`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`, `zh`. | |
| - `brv config get language.mode` / `brv config get language.code` — read back the current setting. | |
| CJK queries (Chinese, Japanese, Korean) are now searchable in BM25 — the tokenizer was previously whitespace-only and treated entire CJK sentences as one token. **Restoration recipe** for users who prefer the prior implicit-English behavior: `brv config set language.code en` then `brv config set language.mode fixed`. Reported by Dmitriy K — thanks for the thorough reproduction in [#616](https://github.com/campfirein/byterover-cli/issues/616). | |
| - **ByteRover preserves your input language by default.** When you curate context in Russian, Chinese, Japanese, Vietnamese, or any other language, the calling agent's LLM is now instructed to author body text in the same language (the schema — tag names, attribute names, enum values, paths — stays English so tooling is unaffected). Configure with `brv settings`: | |
| - `brv settings set language.mode auto` — match the user's input language (default). | |
| - `brv settings set language.mode fixed` + `brv settings set language.code <iso>` — force a specific language. ISO 639-1 codes accepted: `ar`, `de`, `el`, `en`, `es`, `fi`, `fr`, `he`, `hi`, `id`, `it`, `ja`, `ko`, `nl`, `no`, `pl`, `pt`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`, `zh`. | |
| - `brv settings get language.mode` / `brv settings get language.code` — read back the current setting. | |
| CJK queries (Chinese, Japanese, Korean) are now searchable in BM25 — the tokenizer was previously whitespace-only and treated entire CJK sentences as one token. **Restoration recipe** for users who prefer the prior implicit-English behavior: `brv settings set language.code en` then `brv settings set language.mode fixed`. Reported by Dmitriy K — thanks for the thorough reproduction in [#616](https://github.com/campfirein/byterover-cli/issues/616). |
Also note that the PR body still references the deprecation-gate cases (Case 6) as part of the harness migration; per the d7f3bf6 commit message those were also removed (language-selection-e2e went 22 → 20). Worth a quick sweep on the PR description so future archaeology lines up with the final commit.
| <ConcurrencyPanel /> | ||
| <LlmPanel /> | ||
| <TaskHistoryPanel /> | ||
| <LanguagePanel /> |
There was a problem hiding this comment.
issue (consistency): This places LanguagePanel before UpdatesPanel, but the canonical CATEGORY_ORDER in src/shared/types/settings-row.ts:37-44 is ['concurrency', 'llm', 'task-history', 'updates', 'language', 'other']. The TUI (tui/features/settings/utils/format-settings.ts) and brv settings text output (oclif/commands/settings/index.ts:71) both sort by that constant, so this is the only surface where Language renders before Updates.
That's the exact split-source drift CATEGORY_ORDER's docstring warns against:
Adding a new category here … is the only edit needed for both consumers to render it —
oclif/commands/settings/index.tsandtui/features/settings/utils/format-settings.tsboth import from this module, so the surfaces never drift.
The WebUI is now a third surface and it doesn't subscribe. Cheapest fix is to swap these two lines:
| <LanguagePanel /> | |
| <UpdatesPanel /> | |
| <LanguagePanel /> |
Lower-priority follow-up: drive the panel list from CATEGORY_ORDER so future additions only need an entry in the shared constant.
| // Auto clause unconditional: the MCP tool description is built once at | ||
| // server-boot, so it cannot read live config. Per-call fixed-mode is | ||
| // honored via the oclif `brv curate` kickoff prompt (which IS dynamic). | ||
| // MCP-only consumers under `language: { mode: 'fixed' }` see the auto | ||
| // clause here; their input language still gets preserved because auto | ||
| // says "match the input language". | ||
| '# Language', | ||
| '', | ||
| buildLanguageClause(), |
There was a problem hiding this comment.
issue (UX gap, non-blocking): The MCP tool description is materialized once at server-boot, so buildLanguageClause() (no argument) always emits the auto clause regardless of the user's language.mode / language.code setting. The comment correctly documents this and reasons that auto says "match the input language" so multilingual users get correct behavior — and that's true.
But here's the user-visible gap: someone who explicitly chose mode: 'fixed', code: 'ru' because their input is English but they want output in Russian (e.g. a non-native English speaker writing prompts in English but wanting saved memory in their native language) will see the MCP-driven curate save in English. The oclif brv curate kickoff prompt does honor fixed mode (via resolveLanguagePreference), so the two surfaces diverge — and there's no breadcrumb in the MCP response that says "your fixed-language preference was ignored on this path."
Two ways to close the loop, in order of cost:
- Document the limitation in the MCP tool's user-facing surface (the response footer or
brv connectors install …README) so MCP users know to use the oclif kickoff when fixed-language matters. - Plumb
languageinto MCP either by rebuilding the description per-call (expensive — registerTool happens once) or by injecting the clause into the per-call response wrapper (cheaper — the daemon already knows the settings).
(1) is a docs change; (2) is a follow-up ticket. Leaning toward (1) for now plus an "Out of scope" line in this PR, since the auto-fallback degrades safely.
| disabled={setMutation.isPending} | ||
| onValueChange={(next) => { | ||
| if (next === null) return | ||
| choose(next).catch(noop) |
There was a problem hiding this comment.
nit: if (next === null) return is dead-code-shaped — Select.onValueChange from Radix (which @campfirein/byterover-packages/components/select wraps) emits string, never null. TypeScript should have flagged this if the callback signature was tight; it likely passed because the callback param is implicitly typed via useCallback inference.
Two cleaner alternatives:
- Drop the check if the signature is
(value: string) => void:
| choose(next).catch(noop) | |
| onValueChange={(next) => { | |
| choose(next).catch(noop) | |
| }} |
- Surface the edge case if the signature is genuinely
(value: string | null) => void(clear-selection support) — silently returning hides a state the user just produced. A short toast or no-op acknowledgment beats an invisible drop.
The cheapest correctness win is to confirm the upstream type and pick (1) or (2) explicitly. Today's behavior is fine — this is housekeeping.
| [0x30_40, 0x30_9F], | ||
| [0x30_A0, 0x30_FF], | ||
| [0xAC_00, 0xD7_AF], | ||
| ] |
There was a problem hiding this comment.
suggestion (non-blocking, zh-TW coverage): The ranges deliberately exclude CJK Extension A (0x3400-0x4DBF), which covers ~6,500 codepoints in active use across Traditional Chinese — proper nouns, place names, corporate names, academic and historical text in zh-TW and zh-HK corpora. The comment defends the exclusion as "rarely in user content," which holds on aggregate but optimistically for Taiwan/HK knowledge bases.
Extension A is BMP, so for...of handles it without surrogate-pair work, and the bigram-index cost is proportional to occurrences (effectively zero on non-zh-TW corpora). Adding it is one line:
| ] | |
| const CJK_RANGES: ReadonlyArray<readonly [number, number]> = [ | |
| [0x34_00, 0x4D_BF], | |
| [0x4E_00, 0x9F_FF], | |
| [0x30_40, 0x30_9F], | |
| [0x30_A0, 0x30_FF], | |
| [0xAC_00, 0xD7_AF], | |
| ] |
The maintainer note already points at INDEX_SCHEMA_VERSION = 7 for invalidation — that's the cost. If you'd rather stay conservative, leaving as-is is fine; log a follow-up the first time a zh-TW user reports search misses on rare characters.
| /** | ||
| * Resolve the language preference. Daemon settings (the source of | ||
| * truth) take precedence; a per-project `.brv/config.json language` | ||
| * field acts as a fallback for users who configured language before | ||
| * it moved to global settings. | ||
| * | ||
| * Note on precedence: only daemon `mode: 'fixed'` short-circuits the | ||
| * fallback. An explicit daemon `mode: 'auto'` reads as "no opinion" | ||
| * and falls through to project config, so a stale project-config | ||
| * `fixed/X` will still win. This is intentional for the migration | ||
| * window — distinguishing "user explicitly chose auto" from "user | ||
| * never touched settings" needs raw-overrides access that the | ||
| * transport doesn't expose today, and the bug only manifests for | ||
| * users with a pre-existing per-project `language` field. Revisit | ||
| * once project-config language is fully sunset. | ||
| */ | ||
| private async resolveLanguagePreference(projectRoot: string): Promise<BrvConfigLanguage | undefined> { | ||
| const fromSettings = await readLanguageFromSettings() | ||
| if (fromSettings !== undefined) return fromSettings | ||
|
|
||
| try { | ||
| const config = await new ProjectConfigStore().read(projectRoot) | ||
| return config?.language | ||
| } catch { | ||
| return undefined | ||
| } | ||
| } |
There was a problem hiding this comment.
issue (non-blocking — but worth tracking): With brv config now removed (per commit d7f3bf6), this fallback's job has shifted from "honor a current-but-deprecated surface" to "honor stale state nobody can write any more." A user mid-migration whose .brv/config.json still has {language: {mode: 'fixed', code: 'ru'}} and who then flips daemon language.mode to auto will not see auto take effect — readLanguageFromSettings returns undefined for any non-fixed daemon mode, and the project-config fixed wins via this fallthrough.
That's the same precedence quirk the prior auto-reviewer flagged (the docstring at L575-585 calls it out and defers it to the project-config sunset). The new wrinkle is that the user has no way to clear the stale state — there's no brv config set ... to write auto over it, and editing .brv/config.json by hand is undocumented. So the fallback now acts as a one-way migration trap rather than a graceful bridge.
Two ways forward:
- Honor explicit
auto— changereadLanguageFromSettingsto return{mode: 'auto'}when the daemon explicitly says auto (instead ofundefined), and treat that as "user has opinions, do not consult fallback." Closes the trap cleanly and matches what a user expects after toggling the WebUI Language → auto control. - Drop the project-config fallback entirely in a follow-up PR — the PR description's "Out of scope" already names this, but with
brv configgone the fallback no longer serves a live surface; it only matters for users upgrading from a pre-ENG-2974 build.
(1) is a small, surgical change behind a unit test; (2) needs a migration sweep but cleans up the legacy code path. (1) makes (2) safer to land later. Worth at least surfacing on the WebUI Language → auto control with a one-line "if you previously configured a per-project language, it may still apply" hint until (1) lands, to avoid the "I clicked auto but it's still writing in Russian" report.
Summary
Brings the language-selection feature stack to
mainand resolves the divergence between this branch andmain's recently-landed tool-mode response-file surface (ENG-3002) and<img>first-class inline content (ENG-3021).After this lands:
"Match the user's input language…"(auto, the default) or"…in <language>"(fixed). The calling agent's LLM is told which language to write the topic body in.language.mode/language.code) since ENG-2974. A per-project.brv/config.json languagefield is still honored as a fallback for users who configured before the migration.curatetool description carries the auto clause so the same instruction reaches non-CLI callers.The recently-merged main features (
--response-file/--delete-response-fileand<img>inline support) are preservedunchanged — the language thread composes on top, not in place of.
What's included
Feature work (history-only — these landed on the branch already):
BrvConfig.languageschema + language-clause builderlanguagethroughkickoffSessionandcontinueSession; auto clause appended to MCPTOOL_DESCRIPTIONINDEX_SCHEMA_VERSIONbumped to 7)brv queryhelper + cross-language curate→query roundtrip coverage in the auto-test harnessbrv config set/get language.*CLI (now intercepted and redirected tobrv settings setper ENG-2974)brv config set language.*kept as a deprecation gate that names the new commandMerge-resolution work (new in this PR):
src/agent/infra/tools/implementations/search-knowledge-service.ts— UNION of ENG-3021'sparsed.imageContentindexer wiring (frommain) and ENG-2689'stokenizeWithCjkinMINISEARCH_OPTIONS(from this branch).INDEX_SCHEMA_VERSIONhappened to be at 7 on both sides — coincidence, both changes needed an index invalidation.src/oclif/commands/curate/index.ts— Keptmain's richerhandleContinuationbody (mutex checks +loadResponseFile+ local pre-dispatch validation +dispatchContinuationhelper + post-success--delete-response-filecleanup) and grafted the language thread on top:resolveLanguagePreference(projectRoot)is called once afterprojectRootis resolvedlanguageis passed intodispatchContinuation(...)dispatchContinuation's args widened to acceptlanguage?: BrvConfigLanguageand forwarded intocontinueSession({...})Imports merged as a clean disjoint union —
BrvConfigLanguage,ProjectConfigStore,SettingsEvents,SETTINGS_KEYSfrom thisbranch +
CurateSessionEnvelopeand the full curate-sessionhelper set from
main.curate-session.ts) already acceptlanguage?on bothKickoffOptionsandContinueOptions, so nolib changes were required.
Harness migration (
local-auto-test/language-selection-e2e):brv config set language.*to
brv settings set language.*+settings getverification.Settings are GLOBAL since ENG-2974, so each migrated case wraps
its mutation in
try/finallywithresetLanguageSettings()tokeep cases isolated.
without-code" validation (no longer applicable —
language.modeand
language.codeare independent settings keys) into adeprecation-gate test that confirms
brv config set language.*rejects with
code: 'deprecated-key'and names the newbrv settings set ...command.restartDaemonso a prior aborted runcannot leak global state into the next run.
cases.
Conflict-resolution summary (for future archaeology)
search-knowledge-service.tscurate/index.ts(imports)curate/index.ts(handleContinuation)main's body, grafted language thread viadispatchContinuationwideningcurate-session.tsTest plan
npm run lint— 0 errors (280 pre-existing warnings on unrelated test files, none on merged files)npm test— 9006 / 9006 passinglocal-auto-test/curate-tool-mode-e2e— 20 / 20 (all--response-file/--delete-response-filecases + all language-clause cases pass)local-auto-test/language-selection-e2e— 22 / 22 after thebrv settings setmigrationbrv settings set/get language.mode|coderound-trips;brv config set language.*deprecation message names the newbrv settings setcommandReviewer call-outs
dispatchContinuationwidening (addedlanguage?: BrvConfigLanguageto its args type) is the only behavioural change tomain's response-file surface — all other main-side contracts are byte-equivalent. Worth a focused read onsrc/oclif/commands/curate/index.ts..brv/config.jsonproject-config fallback tier thatresolveLanguagePreferencestill honors.INDEX_SCHEMA_VERSIONwas bumped to 7 by both branches independently (CJK tokenizer on this branch,<img>indexer onmain). Existing cached indexes were already invalidated by whichever side landed first; no additional bump needed.Out of scope
languagefield (theresolveLanguagePreferencefallback). Tracked separately — needs raw-overrides exposure on the settings transport somode: 'auto'from daemon can short-circuit the fallback cleanly.brv config set/getentirely — the surface still hosts thelanguage.*deprecation gate and theapplyConfigSetdispatcher for the next project-config key that needs to land.