Skip to content

placement/tests: add diff_layout command; fix format-truncation warning#17623

Draft
Copilot wants to merge 16 commits intomasterfrom
copilot/add-interactive-placement-debug-tool
Draft

placement/tests: add diff_layout command; fix format-truncation warning#17623
Copilot wants to merge 16 commits intomasterfrom
copilot/add-interactive-placement-debug-tool

Conversation

Copy link

Copilot AI commented Mar 1, 2026

pl_debug lacked a way to inspect which shards need rebuilding after a target failure/reintegration. Also fixes a -Werror=format-truncation build error in the grp= parsing path.

Changes

  • cmd_diff_layout() — new diff_layout interactive command wrapping pl_obj_find_rebuild(); accepts id=, ver= (defaults to latest pool map version), and type=/grp= class override; for each rebuild shard prints: shard ID, target ID, rank, idx, status (DOWN/DOWNOUT/UP/UPIN), and flags (DOWN2UP etc.)
  • gen_layout help text — corrects "default: current" → "default: latest" to match existing code behaviour (pool_map_get_version)
  • Format-truncation fixsnprintf(grp_str, …, "%ld", v)"%u", (unsigned int)v; %u is at most 10 digits, safely fits the 16-byte buffer

Example

pl_debug> obj_class OC_RP_3GX
pl_debug> set_down rank=0
pl_debug> diff_layout id=42
Rebuild shards for OID lo=42 class=RP_3GX ver=2: 3 shard(s)
  shard   0: target_id=   0  rank=   0  idx= 0  status=DOWN
  shard   3: target_id=   1  rank=   0  idx= 1  status=DOWN
  shard   6: target_id=   2  rank=   0  idx= 2  status=DOWN

pl_debug> set_up rank=0
pl_debug> diff_layout id=42
Rebuild shards for OID lo=42 class=RP_3GX ver=3: 3 shard(s)
  shard   0: target_id=   0  rank=   0  idx= 0  status=UP       flags=DOWN2UP

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).
Original prompt

Create a new interactive placement debugging utility pl_debug under src/placement/tests in daos-stack/daos.

Background/goal:
We want a developer tool similar in spirit to jump_map_dist.c but interactive. The tool should construct an in-memory pool map and jump placement map from user-specified topology parameters and allow interactive commands to change pool component status and generate/print object layouts.

Repository/branch:

  • Repo: daos-stack/daos
  • Base branch: master

Binary/tool:

  • Name: pl_debug
  • Location: src/placement/tests/pl_debug.c (or .cc if needed)
  • Ensure it is built by the DAOS build system (CMake) in the src/placement/tests area.

Command-line options:

  • -n <number>: number of nodes
  • -r <number>: number of ranks per node
  • -t <number>: number of targets per rank
    After parsing options, create:
  • an in-memory pool map with a hierarchy such that node is the fault domain
  • ranks distributed with r ranks per node across n nodes
  • t targets per rank
  • a jump placement map based on that pool map

Interactive shell:
After setup, enter interactive mode (read-eval-print loop). Support commands:

  1. obj_class <str_name>

    • Convert str_name to an object class ID as defined in src/include/daos_obj_class.h
    • Set the selected class in a global/current variable used by gen_layout
    • Accept either enum-style names (e.g. OC_EC_8P3GX) and also allow numeric input (optional but preferred)
    • Print the resolved class ID and name.
  2. gen_layout id=<number>

    • Use <number> as oid.lo
    • Use currently-selected obj class and current pool map/placement map to generate an object layout
    • Print the layout in a human-readable form including (at minimum):
      • shard index
      • target id
      • rank
      • target index
      • fseq
      • rebuilding flag if present
    • Include group/stripe group delineation if applicable.
  3. set_down rank=<number>|node=<number>

    • Set specified rank or node and all targets under it to status DOWN
    • Use ds_pool_map_tgts_update() to apply target state change to the pool map (as requested)
  4. set_downout rank=<number>|node=<number>

    • Set status to DOWNOUT (again via ds_pool_map_tgts_update())
  5. set_up rank=<number>|node=<number>

    • Set status to UP (via ds_pool_map_tgts_update())
  6. set_upin rank=<number>|node=<number>

    • Set status to UPIN (via ds_pool_map_tgts_update())

General shell behavior:

  • Provide help and quit/exit commands.
  • Validate inputs with clear error messages.
  • After any pool map update, ensure placement map sees the updated pool map (recreate placement map or refresh as needed).

Implementation notes:

  • Follow existing patterns in src/placement/tests and/or jump_map_dist.c.
  • Keep the tool self-contained; it should not require a running DAOS system.
  • Use DAOS internal APIs to create pool map and jump placement map.
  • Choose/initialize reasonable defaults for OID hi fields, object metadata, etc.

Testing:

  • Add minimal smoke-test instructions in comments or documentation.
  • Ensure it compiles on master.

Deliverables:

  • New source file(s) under src/placement/tests
  • CMake updates to build pl_debug
  • Any small helper changes required for compilation (avoid large refactors)

This pull request was created from Copilot chat.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title [WIP] Add interactive placement debugging utility pl_debug Add interactive placement debugging utility pl_debug Mar 1, 2026
@daosbuild3
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17623/2/execution/node/305/log

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title Add interactive placement debugging utility pl_debug placement/tests: add interactive pl_debug tool with gen_layout mode= support Mar 1, 2026
@daosbuild3
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17623/3/execution/node/304/log

@daosbuild3
Copy link
Collaborator

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
@daosbuild3
Copy link
Collaborator

Copilot AI changed the title placement/tests: add interactive pl_debug tool with gen_layout mode= support placement/tests: add mode= and ver= options to pl_debug gen_layout command Mar 1, 2026
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17623/4/execution/node/305/log

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

…debug

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title placement/tests: add mode= and ver= options to pl_debug gen_layout command placement/tests: add ver= to gen_layout and clarify opc naming in pl_debug Mar 1, 2026
@daosbuild3
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17623/10/display/redirect

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title placement/tests: fix rank domain status on set_down/set_downout; add print_obj_class; require obj_class before gen_layout placement/tests: pl_debug gen_layout requires obj_class; add print_obj_class with hint filtering Mar 1, 2026
Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title placement/tests: pl_debug gen_layout requires obj_class; add print_obj_class with hint filtering placement/tests: pl_debug print_obj_class hint filtering + gen_layout type=/grp= override Mar 1, 2026
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title placement/tests: pl_debug print_obj_class hint filtering + gen_layout type=/grp= override placement/tests: pl_debug gen_layout type=/grp= override + query shows co_flags Mar 2, 2026
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title placement/tests: pl_debug gen_layout type=/grp= override + query shows co_flags placement/tests: pl_debug DOWN2UP flag visibility + format-truncation fix Mar 2, 2026
@daosbuild3
Copy link
Collaborator

Test stage Fault injection testing on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17623/15/execution/node/999/log

…target status

Co-authored-by: gnailzenh <7268050+gnailzenh@users.noreply.github.com>
Copilot AI changed the title placement/tests: pl_debug DOWN2UP flag visibility + format-truncation fix placement/tests: add diff_layout command; fix format-truncation warning Mar 2, 2026
@daosbuild3
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17623/16/execution/node/304/log

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants