Skip to content

[SPARK-56202][SS] Refactor streaming join tests: split Base/Suite hierarchy and simplify mode dispatch#55005

Open
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:SPARK-XXX-join-test-modes
Open

[SPARK-56202][SS] Refactor streaming join tests: split Base/Suite hierarchy and simplify mode dispatch#55005
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:SPARK-XXX-join-test-modes

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Mar 25, 2026

What changes were proposed in this pull request?

Refactor streaming join test suites to separate test mode dispatch from the test hierarchy.

Before: AlsoTestWithVirtualColumnFamilyJoins runs every test in both VCF and non-VCF
modes within a single suite class. V4 suites extend these classes and use a string-matching
skip list to exclude incompatible tests.

After: Each join type has a two-layer hierarchy:

StreamingInnerJoinBase          (universal tests — all state format versions)
├── StreamingInnerJoinSuite     (adds V3-specific tests: window joins, metrics, schema)
│   ├── StreamingInnerJoinWithVCFSuite      (testMode = WithVCF)
│   └── StreamingInnerJoinWithoutVCFSuite   (testMode = WithoutVCF)
└── StreamingInnerJoinV4Suite   (testMode = WithVCF, stateFormatVersion = 4)

Mode dispatch uses a testMode: Mode enum in StreamingJoinSuite, replacing the
AlsoTestWithVirtualColumnFamilyJoins trait. Each concrete suite overrides testMode
with a single Mode value. The enum is more future-proof than a boolean — new modes
can be added without changing the dispatch type. V4 suites extend *Base directly —
no skip list needed.

Why are the changes needed?

  1. Failure isolation: Suite name identifies the exact mode — no long disambiguating suffixes.
  2. Selective retrigger: Rerun only the failing mode's suite without rerunning the passing mode.
  3. No skip lists: V4 suites get only universal tests by inheriting from *Base, not *Suite. Adding V3-specific tests in the future won't accidentally break V4.
  4. Extensible dispatch: A Mode enum replaces a trait with hard-coded dual-mode dispatch. Adding new modes (e.g., Avro encoding) is a new enum case + a one-liner suite.

Does this PR introduce any user-facing change?

No. Test infrastructure only. Same tests, same coverage.

How was this patch tested?

Existing tests — the *WithVCFSuite and *WithoutVCFSuite classes run the same tests as before, split by mode.

Was this patch authored or co-authored using generative AI tooling?

Yes

…rchy and use useVirtualColumnFamilies

Replace AlsoTestWithVirtualColumnFamilyJoins with a useVirtualColumnFamilies
abstract boolean in StreamingJoinSuite. Split each join suite into:
- *Base: universal tests that work with any state format version
- *Suite extends *Base: adds tests specific to V1-V3
- *WithVCFSuite / *WithoutVCFSuite: concrete single-mode suites

V4 suites extend *Base directly, so they only get universal tests —
no skip list needed. TestWithV4StateFormat becomes a simple trait that
sets useVirtualColumnFamilies=true and wraps with stateFormatVersion=4.

Co-authored-by: Isaac
@cloud-fan cloud-fan changed the title [SPARK-XXX][SS] Refactor streaming join tests: split Base/Suite hierarchy and simplify mode dispatch [SPARK-56202][SS] Refactor streaming join tests: split Base/Suite hierarchy and simplify mode dispatch Mar 25, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Structured Streaming join test suites to separate “mode dispatch” (VCF vs non-VCF) from the test hierarchy, enabling single-mode suites for clearer failures and more targeted reruns.

Changes:

  • Replaces the dual-mode AlsoTestWithVirtualColumnFamilyJoins dispatch with a boolean useVirtualColumnFamilies in StreamingJoinSuite.
  • Splits join suites into *Base (universal tests) vs *Suite (V1–V3-only tests), and updates V4 suites to extend *Base directly.
  • Introduces concrete single-mode suites (*WithVCFSuite / *WithoutVCFSuite) to run modes independently.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala Introduces boolean-based mode dispatch, splits Base vs V1–V3 suite layers, and adds concrete single-mode suites.
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinV4Suite.scala Updates V4 suites to inherit only universal tests and enforces VCF mode via the new dispatch mechanism.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ScalaTest already invokes beforeEach/afterEach around each registered
test. The manual super.beforeEach()/super.afterEach() calls inside
testWithVirtualColumnFamilyJoins and testWithoutVirtualColumnFamilyJoins
were double-invoking them.
More future-proof: new modes can be added without changing the dispatch
type. Each concrete suite overrides testMode with a single Mode value.
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
The change looks very reasonable - thanks for the change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants