Skip to content

DAOS-19122 ddb: Fix flag name collision between global and subcommand parsers#18466

Open
knard38 wants to merge 4 commits into
ckochhof/fix/master/daos-18304-patch-002from
ckochhof/fix/master/daos-19122/patch-001
Open

DAOS-19122 ddb: Fix flag name collision between global and subcommand parsers#18466
knard38 wants to merge 4 commits into
ckochhof/fix/master/daos-18304-patch-002from
ckochhof/fix/master/daos-19122/patch-001

Conversation

@knard38

@knard38 knard38 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

Commit 86f31a2 (DAOS-18304) introduced a regression: ddb rm_pool --db_path ... fails with:

ERROR: ddb: Cannot use sys db path without a VOS path

This breaks the recovery/pool_list_consolidation.py test (test_lost_majority_ps_replicas) on MD-on-SSD clusters, as observed in PR #17965 CI build 22.

Root cause: go-flags processes the full command line before dispatching to grumble. Because --db_path is registered as a global option in cliOptions, go-flags intercepts it — even when the user supplies it as a subcommand-level argument (e.g. ddb rm_pool --db_path ...). The new validation added by 86f31a2bd3 then rejects the invocation because the global --vos_path was not also provided. The same collision affects open (-w, -p), feature (-p, -s), and prov_mem (-s).

Solution

This PR fixes the regression and improves the feature command with better input validation, all covered by new unit tests. The PR is split into four focused commits, each independently buildable and testable.

Commit 1 — Go style: idiomatic logger initialisation

Replace a two-statement var declaration + assignment with a single := short variable declaration, as suggested in code review of PR #18086.

Commit 2 — Fix: PassAfterNonOption

Add flags.PassAfterNonOption to the go-flags parser. This makes go-flags stop consuming flags after the first positional argument (the subcommand name). Everything after the subcommand lands in RunCmdArgs and is forwarded to grumble's own flag parser, which correctly handles command-level flags.

parser := flags.NewParser(&opts, flags.HelpFlag|flags.IgnoreUnknown|flags.PassAfterNonOption)

Also includes the two TestDdb_parseOpts cases whose expected behavior changed with PassAfterNonOption (see Side effects below), a doc fix (--vos-path--vos_path), and error message constants for dtxAggr.

Commit 3 — Regression tests

  • TestDdb_parseOpts: two new cases — --db_path after the subcommand is not consumed globally (core regression case); --db_path before the subcommand still triggers the vosPathMissErr validation (correct misuse detection preserved).
  • TestDdb_runDdb: complete noAutoOpen coverage for all 8 commands (close, prov_mem, dev_list, dev_replace were previously missing).
  • TestDdb_Cmds: remove skipCmdLine (now obsolete); add regression tests for open (-w, --db_path), rm_pool (--db_path), and prov_mem (-s).

Commit 4 — feature command improvements

  • Add Go-layer validation: exactly one of --enable/--disable/--show required; --db_path requires a VOS path argument.
  • Add featureOnlyOneOptErr constant and onlyOne() helper.
  • Add LongHelp documenting the validation rules.
  • Replace featureFnCheckingShow with the more general featureFnChecking factory; add string2FlagsCapturing to verify enable/disable routing.
  • Full test coverage: short/long forms of all flags, all validation error paths.

Side effects

PassAfterNonOption changes one existing user-visible behavior: flags that appear after the subcommand name are now passed to grumble instead of go-flags. The most visible effect is that ddb <cmd> --help now shows the subcommand-specific help rather than the general ddb help:

# Before: showed general ddb help (go-flags consumed --help globally)
# After:  shows ls subcommand help (grumble handles it correctly)
ddb ls --help

This is a correction of incorrect behavior — users naturally expect ddb ls --help to document the ls command. Flags placed before the subcommand (e.g. ddb --vos_path /foo ls) continue to work as before.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@knard38 knard38 self-assigned this Jun 9, 2026
@knard38 knard38 added the CR Catastrophic Recovery Feature label Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Ticket title is 'ddb: rm_pool fails with "Cannot use sys db path without a VOS path" regression'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-19122

@knard38 knard38 changed the title Ckochhof/fix/master/daos 19122/patch 001 DAOS-19122 ddb: Fix flag name collision between global and subcommand parsers Jun 9, 2026
kanard38 added 4 commits June 9, 2026 12:18
Replace the two-statement var declaration + assignment with a single
':=' short variable declaration, as suggested in code review.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
… parsers

go-flags processes the full argument list before dispatching to grumble.
When a subcommand defines a flag with the same name as a global option
(e.g. 'rm_pool --db_path', 'open -w', 'feature -s'), go-flags intercepts
it as the global option and the subcommand never receives it.

This was introduced as a regression by commit 86f31a2 (DAOS-18304),
which added a validation that rejected any invocation where --db_path was
set (globally, by go-flags) without --vos_path -- even when the user
correctly passed --db_path as a subcommand-level argument.

Fix: add flags.PassAfterNonOption to the go-flags parser so it stops
consuming flags after the first positional argument (the subcommand name).
Everything after the subcommand lands in RunCmdArgs and is forwarded to
grumble's own flag parser.

Also:
- Move vosPathMissErr to ddb_commands.go (sole use site after this fix)
  and add dtxAggrMutuallyExclusiveErr and dtxAggrRequiredOptErr constants
  to eliminate raw string literals in Go-layer error messages.
- Fix a typo in the CLI long description (--vos-path -> --vos_path) and
  accurately list commands that manage their own pool lifecycle.
- Update the two TestDdb_parseOpts cases whose behavior changed with
  PassAfterNonOption (--help after a subcommand now lands in RunCmdArgs
  instead of being processed by go-flags).

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
- TestDdb_parseOpts: add two cases verifying PassAfterNonOption behavior:
  - flags after the subcommand name are NOT consumed by go-flags (regression case)
  - flags before the subcommand name are still consumed globally (validation still fires)
- TestDdb_runDdb: complete noAutoOpen coverage for all 8 commands in the
  noAutoOpen list (close, prov_mem, dev_list, dev_replace were missing).
- TestDdb_Cmds: remove the now-obsolete skipCmdLine escape hatch and add
  regression coverage for the fixed flag conflicts:
  - open: long/short forms of -w/--write_mode and -p/--db_path now work
    correctly in command-line mode
  - rm_pool: --db_path after subcommand correctly reaches grumble
  - prov_mem: -s/--tmpfs_size flag no longer consumed as global VosPath
  Also update dtxAggr error assertions to use the new named constants.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
- Add Go-layer input validation to the feature command Run handler:
  - Enforce that exactly one of --enable, --disable, --show is provided
  - Reject --db_path when no VOS path argument is given
- Add featureOnlyOneOptErr constant and onlyOne() helper
- Add LongHelp to the feature command documenting the validation rules
- Replace featureFnCheckingShow with the more general featureFnChecking
  factory and add string2FlagsCapturing to verify enable/disable routing
- Add full test coverage for all feature flags (short/long enable, disable,
  show, db_path) and all validation error paths

Features: recovery
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@knard38 knard38 force-pushed the ckochhof/fix/master/daos-19122/patch-001 branch from 8a13591 to c101fde Compare June 9, 2026 12:19
@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18466/2/execution/node/1369/log

@knard38 knard38 marked this pull request as ready for review June 10, 2026 07:35
@knard38 knard38 requested review from a team as code owners June 10, 2026 07:35
@knard38 knard38 requested a review from janekmi June 10, 2026 07:35
@knard38 knard38 assigned knard38 and Nasf-Fan and unassigned knard38 and Nasf-Fan Jun 10, 2026
const dtxAggrRequiredOptErr = "'--cmt_time' or '--cmt_date' option has to be defined"
const featureOnlyOneOptErr = "exactly one of --enable, --disable, --show must be provided"

func onlyOne(bools ...bool) bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this go in a generic utility file for reuse?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitated to add it, but I am not sure it is worth of it ?
@kjacque , @mjmac and @tanabarr any opinion on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CR Catastrophic Recovery Feature

Development

Successfully merging this pull request may close these issues.

7 participants