Skip to content

DAOS-18239 test: DO NOT LAND backport discard retry instrumentation on old base#18459

Draft
kccain wants to merge 2 commits into
masterfrom
kccain/daos_18239_repeat_old
Draft

DAOS-18239 test: DO NOT LAND backport discard retry instrumentation on old base#18459
kccain wants to merge 2 commits into
masterfrom
kccain/daos_18239_repeat_old

Conversation

@kccain

@kccain kccain commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Choose ps leader engine rank in dmg pool exclude command while testing
test_osa_online_reintegration_with_multiple_ranks, to see if that
case may have an impact on a pool_discard() hang on that engine.

And instrument cont_discard_cb() for any retries e.g., that might
continuously get -DER_INPROGRESS (resulting in overall hang),
as seen in original observation.

Test based on older master commit 0ff9ca7 where pool_discard() hang
was originally observed.

Test-tag: OSAOnlineReintegration,test_osa_online_reintegration_with_multiple_ranks
Test-Repeat: 10
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-test-rpms: true
Test-provider-hw-medium: ofi+tcp

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Choose ps leader engine rank in dmg pool exclude command while testing
test_osa_online_reintegration_with_multiple_ranks, to see if that
case may have an impact on a pool_discard() hang on that engine.

Test based on older master commit 0ff9ca7 where pool_discard() hang
was originally observed.

Test-tag: OSAOnlineReintegration,test_osa_online_reintegration_with_multiple_ranks
Test-Repeat: 5
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-test-rpms: true
Test-provider-hw-medium: ofi+tcp

Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Ticket title is 'osa/online_reintegration.py:OSAOnlineReintegration.test_osa_online_reintegration_with_multiple_ranks - dmg: rank 5 failed on pool TestPool_1'
Status is 'In Progress'
Labels: '2.8.0tb1,ci_master_provider,daily_test'
https://daosio.atlassian.net/browse/DAOS-18239

@kccain kccain changed the title Kccain/daos 18239 repeat old DAOS-18239 test: DO NOT LAND backport discard retry instrumentation on old base Jun 8, 2026
…n old base

Choose ps leader engine rank in dmg pool exclude command while testing
test_osa_online_reintegration_with_multiple_ranks, to see if that
case may have an impact on a pool_discard() hang on that engine.

And instrument cont_discard_cb() for any retries e.g., that might
continuously get -DER_INPROGRESS (resulting in overall hang),
as seen in original observation.

Test based on older master commit 0ff9ca7 where pool_discard() hang
was originally observed.

Test-tag: OSAOnlineReintegration,test_osa_online_reintegration_with_multiple_ranks
Test-Repeat: 10
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-test-rpms: true
Test-provider-hw-medium: ofi+tcp

Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
@kccain kccain force-pushed the kccain/daos_18239_repeat_old branch from 5c39b3d to c8204c5 Compare June 9, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant