Rildixon/io retry compat system test by riley-dixon · Pull Request #234 · ROCm/hipFile

riley-dixon · 2026-03-23T21:53:50Z

GTest's framework allows for a mock to call the real implementation of a method. We can utilize this in our system tests to very selectively choose what gets mocked. In this case, we want to mock the call from KFD/HSA to induce a failure to test the fallback mechanism.

Relies on #215

AIHIPFILE-155

One challenge here was ensuring that the right stats counter gets incremented when an IO is retried, or trying to prevent an exception being raised by the stats module from causing the IO to be retried. Parts of RetryableBackend may want to be brought up to Backend, like the update_stats functions.

Adds tests for the behaviour of the retry mechanism, as well as any default behaviour from the RetryableBackend.

This change passes in the original parameters of the IO request for `is_retryable()` to process. By default, we now check that the fallback IO engine will accept the IO request prior to submitting it. This is useful in cases where the Fallback backend is available, but for some reason the request is still invalid. It also removes a hack that was used for testing this scenario with "optionally" mocking `is_retryable()`.

Note: retryable_io() -> _io_impl() Formatter moved the order of some functions around. Previously, a "RetryableBackend" would use io() to wrap the fallback mechanism before calling retryable_io() to actually perform the request. Now, every Backend will use _io_impl() which will be responsible for handling the IO request, leaving io() as the public front-end.

"Retryable" was considered to be a misnomer in this context. Instead of retrying the IO with the same backend, we were resubmitting the IO to a different Backend. While "BackendWithFallback" is not exactly a great name, its something we can change later.

This was originally written to BackendWithFallback. However, this could be further generalized to all Backends. _io_impl() is then only concerned with issuing the IO, letting the io() method do any additional processing.

A negative integer represents a system error. Internally, all of the backends wrap this error code into an exception which gets propagated up to the hipFile API layer. Removing the comment that io() may return a negative error code. Technically this is not enforced as we leave the return type ssize_t, but this is to avoid type-casting elsewhere.

Some commonly used methods/global variables where defined in multiple locations. This was fine as long as unit tests did not try to include test-common.h, or system tests including hipfile-test.h. This consolidates these definitions into test-common.h. Tested by compiling and seeing which unit tests fail to compile due to missing definitions. System tests were unaffected due to not including hipfile-test.h at all.

Since MOCK_PASSTHROUGH is variadic in terms of the function it calls, this macro can likely be used in other test modules.

riley-dixon added 16 commits March 18, 2026 15:46

Test: Add new RetryableBackend unit tests

c77c747

Adds tests for the behaviour of the retry mechanism, as well as any default behaviour from the RetryableBackend.

Backend: Move update_stats_X to Backend

b3a3330

This was originally written to BackendWithFallback. However, this could be further generalized to all Backends. _io_impl() is then only concerned with issuing the IO, letting the io() method do any additional processing.

Fastpath: Create integration test for fallback mechanism

62db64a

Fastpath: Add integration test for tracking what exception gets thrown.

c8a987d

Address copilot suggestions

3ddd9b0

Update Changelog

89a9a9d

MHip: Add passthrough capability

19410c8

add new system test

8d806e5

wip: create passthrough mechanism

48272ba

MHip: Let MOCK_PASSTHROUGH be used by other modules

fc8ff91

Since MOCK_PASSTHROUGH is variadic in terms of the function it calls, this macro can likely be used in other test modules.

riley-dixon self-assigned this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rildixon/io retry compat system test#234

Rildixon/io retry compat system test#234
riley-dixon wants to merge 16 commits intodevelopfrom
rildixon/io-retry-compat-system-test

riley-dixon commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

riley-dixon commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

riley-dixon commented Mar 23, 2026 •

edited

Loading