Skip to content

[Data] Fixture to keep unit tests apart from integration directory#61505

Open
Hyunoh-Yeo wants to merge 14 commits intoray-project:masterfrom
Hyunoh-Yeo:feature/nounitfixture
Open

[Data] Fixture to keep unit tests apart from integration directory#61505
Hyunoh-Yeo wants to merge 14 commits intoray-project:masterfrom
Hyunoh-Yeo:feature/nounitfixture

Conversation

@Hyunoh-Yeo
Copy link
Contributor

@Hyunoh-Yeo Hyunoh-Yeo commented Mar 5, 2026

Description

Adds a pytest fixture warn_if_unit_in_integration to tests/conftest.py that warns developers when a test in the integration directory has no Ray cluster dependency and no intent marker. Also adds two new markers @pytest.mark.integration_test and @pytest.mark.unit_for_integration for developers to document intent explicitly.

The fixture is opt-in via MIGRATED_FILES and applied incrementally as files are reviewed and split. This PR opts in test_arrow_block.py and test_transform_pyarrow.py as the first two files, adding the appropriate markers to all tests in both files.

*Since pytest.ini ignores all warnings with ignore:.* the fixture raises exceptions instead of warnings to ensure CI fails when tests lack proper markers. This only applies to MIGRATED_FILES which I have done migrating and separating integration tests with unit tests.

Related issues

Closes #61339

Additional information

The fixture only applies to test files directly in tests/ (not subdirectories like tests/unit/), and only to files explicitly added to MIGRATED_FILES. See the design doc for full motivation and design decisions.

Local Test file and results

  • Since it require explicit pytester import, I implemented it, tested it locally and deleted it.
  • Also, the behavior is tested through CI on commit f418316
    (I left some intentional errors each corresponds to the fixture's behavior)
  • The test log failed with expected messages.
    • exception should be raised for the commit.
    • (all expected errors were shown with expected messages)

Hyunoh-Yeo and others added 7 commits February 25, 2026 16:32
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
.
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful pytest fixture to enforce separation between unit and integration tests within the tests/ directory. The use of an opt-in mechanism with MIGRATED_FILES is a good strategy for incremental adoption. The implementation is solid, but I have a few suggestions to improve documentation and code style for better maintainability.

…vior

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
…vior

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
…vior

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
@Hyunoh-Yeo Hyunoh-Yeo marked this pull request as ready for review March 5, 2026 20:57
@Hyunoh-Yeo Hyunoh-Yeo requested a review from a team as a code owner March 5, 2026 20:57
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Mar 6, 2026
return

# Skip if the test is marked as an integration test
if request.node.get_closest_marker("integration_test"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if we need integration_test marker. Our purpose here is to prevent people adding unit test into python/ray/data/tests, so maybe we only need to introduce unit_for_integration for unit tests under python/ray/data/tests, and consider other tests without this marker as integration test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of that too, but the issue was that if we do that, we cannot identify every indirect call of ray cluster that does not use ray.init() directly or explicitly having the ray_start fixture, which means the CI might incorrectly call errors to integration tests as unit tests, which can be confusing. Having integration_test was the best call to let developers themselves to identify what their tests are doing and locate to the appropriate positions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for this, I might need confirmation whether we enforce every tests with direct and indirect calls of ray cluster to have ray_start fixtures. I was taking the safer option

@machichima
Copy link
Contributor

machichima commented Mar 6, 2026

Could we use pytest_runtest_setup (docs) that will run before each test item. And we can directly use pytest.fail to mark the test as failed, without the need to execute the test.

@Hyunoh-Yeo
Copy link
Contributor Author

Hyunoh-Yeo commented Mar 6, 2026

@machichima Thank you for your comments! The current approach is intentional. we want to confirm the test itself is passing while separately flagging the missing marker. If we fail in setup, we lose visibility into whether the test logic is actually correct. If the test failes before the marker check, developers can first work on have their test working and then add markers for the final check.

Without this, the developers who were unaware of the marker will follow:
identify marker issue -> add marker -> identify test issue -> fix test -> marker issue again depending on test change (loop)

With this:
identify test issue -> fix test -> identify marker issue -> add marker -> OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Fixture to keep unit tests apart from integration directory

2 participants