Skip to content

PHOENIX-7863 Add replication consistency point guard to compaction#2489

Open
Himanshu-g81 wants to merge 15 commits into
apache:PHOENIX-7562-feature-newfrom
Himanshu-g81:PHOENIX-7863
Open

PHOENIX-7863 Add replication consistency point guard to compaction#2489
Himanshu-g81 wants to merge 15 commits into
apache:PHOENIX-7562-feature-newfrom
Himanshu-g81:PHOENIX-7863

Conversation

@Himanshu-g81

Copy link
Copy Markdown
Contributor

On standby clusters, compaction can prematurely purge delete markers that have timestamps newer than the replication consistency point. This causes permanent stale data because when replay eventually catches up, the delete markers that should have been applied are already gone.

This PR adds a compaction guard that floors maxLookbackWindowStart to the minimum consistency point across all HA groups, ensuring delete markers newer than the consistency point are retained until replay has processed them.

The guard is controlled by phoenix.replication.compaction.guard.enabled (default true). It activates automatically when phoenix.replication.replay.enabled=true — no additional configuration needed on standby clusters. The separate flag allows operators to disable the guard independently if needed without turning off replay entirely.

Himanshu Gwalani added 6 commits May 27, 2026 19:19
On standby clusters, compaction can prematurely drop delete markers newer
than the replication consistency point, causing permanent stale data.
This adds a guard that floors maxLookbackWindowStart to the minimum
consistency point across all HA groups when replication replay is enabled.

Enabled via phoenix.replication.compaction.guard.enabled (default true),
active only when phoenix.replication.replay.enabled is also true. Falls
back to retaining all delete markers if the consistency point is
unavailable.
Move applyReplicationConsistencyGuard and adjustMaxLookbackWindowStart
from CompactionScanner to ReplicationLogReplayService where the
consistency point logic belongs. Relocate unit test to
replication.reader package accordingly.
Now that the guard logic lives in the same class, public access is no
longer needed. Tests are in the same package and can still access it.
Co-locate REPLICATION_COMPACTION_GUARD_ENABLED and its default with
the other replication config constants in ReplicationLogReplayService,
removing them from QueryServices/QueryServicesOptions.
- Remove incorrect @VisibleForTesting from applyReplicationConsistencyGuard
  (it is genuinely public, called from CompactionScanner)
- Reduce adjustMaxLookbackWindowStart to package-private
- Fix newHashMapWithExpectedSize(4) to 5 in both IT classes
@apurtell

apurtell commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

phoenix.replication.compaction.guard.enabled

Given this is a critical safety feature to prevent data desynchronization, when would it ever be disabled?

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a replication consistency-point “compaction guard” to prevent standby-cluster compactions from purging delete markers that replication replay has not yet caught up to, avoiding permanent stale data.

Changes:

  • Add a new compaction guard configuration (phoenix.replication.compaction.guard.enabled, default true) and guard logic in ReplicationLogReplayService to floor maxLookbackWindowStart by the minimum replication consistency point.
  • Apply the guard from CompactionScanner when replication replay is enabled (and the guard is enabled).
  • Add unit + integration coverage for the guard behavior when enabled/disabled and when consistency point retrieval fails.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
phoenix-core/src/test/java/org/apache/phoenix/replication/reader/ReplicationCompactionGuardTest.java Unit tests for the pure window-flooring adjustment logic.
phoenix-core/src/it/java/org/apache/phoenix/replication/reader/CompactionReplicationGuardIT.java IT coverage validating delete-marker retention/purge behavior with the guard enabled.
phoenix-core/src/it/java/org/apache/phoenix/replication/reader/CompactionReplicationGuardDisabledIT.java IT coverage ensuring normal compaction behavior when the guard is disabled.
phoenix-core-server/src/main/java/org/apache/phoenix/replication/reader/ReplicationLogReplayService.java Adds guard config/constants, guard application + testing hooks.
phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/CompactionScanner.java Wires the guard into compaction by adjusting maxLookbackWindowStart.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Himanshu Gwalani added 2 commits June 7, 2026 17:14
- Move guard enforcement from store-level to RowContext.setTTL() to fix
  TTL window bypass (Math.min cap after Math.max with ttlWindowStart)
- Add Guava memoized cache (30s TTL) for consistency point to reduce
  NameNode RPCs during compaction bursts
- Gate guard on major compactions only (no effect during flush/minor)
- Make constructor private, replace mock injection with test factory
  (setConsistencyPointForTesting)
- Demote per-compaction logging from INFO to DEBUG
- Keep guard config flag for operational flexibility
Verify that resolveConsistencyPoint uses cached value from Guava
memoized supplier and does not re-fetch on every call within TTL.
@Himanshu-g81

Himanshu-g81 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

phoenix.replication.compaction.guard.enabled

Given this is a critical safety feature to prevent data desynchronization, when would it ever be disabled?

The idea was if we have any issue in consistency point calculation, etc and we need to disable this check specificailly to avoid issues with compaction temporarily (since max-look back anyway hold the delete markers upto 3 days, or we update that config and disable this one to remediate the issue).
So this is just to have more control over compaction behaviour.

@Himanshu-g81 Himanshu-g81 requested a review from apurtell June 11, 2026 14:03
}

private ReplicationLogReplayService(long fixedConsistencyPoint) {
this.cachedConsistencyPoint = () -> fixedConsistencyPoint;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the final field conf been initialized ?

}

private ReplicationLogReplayService(Supplier<Long> supplier) {
this.cachedConsistencyPoint = Suppliers.memoizeWithExpiration(supplier,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the final field conf been initialized ?

@tkhurana

Copy link
Copy Markdown
Contributor

When the cache is cold or expired, the delegate calls getConsistencyPoint() →
ReplicationLogDiscoveryReplay.getConsistencyPoint(), which reads lastRoundInSync (ReplicationLogDiscoveryReplay.java:570-574, 584-585):

private ReplicationRound lastRoundInSync; // line 106 — plain field, NOT volatile

That field is written by the background replay thread (setLastRoundInSync at :328, and init at :242/:249) and has no volatile modifier and no synchronization (:438-448).
Before this PR, getConsistencyPoint() was only ever called from the single replay scheduler thread (REPLICATION_REPLAY_SERVICE_EXECUTOR_THREAD_COUNT = 1, invoked at
:287/:349), so the field was effectively thread-confined.

This PR makes HBase compaction threads (the large/small compaction pools — commonly >1, so multiple CompactionScanner constructors run in parallel on one RS) call
getConsistencyPoint() for the first time from outside that thread. The synchronized(this) inside the Guava supplier serializes the compaction threads against each other,
but it locks the supplier object — it establishes no happens-before with the replay thread's writes to lastRoundInSync. Consequences:

  • Visibility: a compaction thread can observe a stale lastRoundInSync, or even null after the replay thread has set it — which trips the throw new
    IOException("...lastRoundInSync is not initialized") at :577/:588 → resolveConsistencyPoint catches it → returns 0L → retain everything (the silent fail-closed path from
    the main review). So the race doesn't corrupt data, but it can spuriously and silently disable purging.
  • replicationReplayState is an AtomicReference (:109) so that read is fine; lastRoundInSync is the unprotected one.

@apurtell apurtell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highlighted a build break and consistency issue. Both must be fixed. In addition the tests still don't cover the TTL case that motivated the rewrite.

I asked the robot to suggest necessary test improvements:

Add back a unit test on computeRowMaxLookbackWithGuard(...) covering at least:

  • ttlWindowStart > maxLookbackWindowStart > consistencyPoint (TTL-dominated, guard caps)
  • maxLookbackWindowStart > ttlWindowStart > consistencyPoint (lookback-dominated, guard caps)
  • consistencyPoint > both (guard inactive)
  • consistencyPoint = 0 (fallback)
  • consistencyPoint = Long.MAX_VALUE (guard disabled at call site)

Add an IT case to CompactionReplicationGuardIT that creates the table with TTL = N (using HBase CF TTL via ALTER or Phoenix TTL property), sets consistencyPoint = beforeDeleteTime - 1, advances past TTL by M >> N, and asserts the delete marker is retained. Without this, the row-level fix is only theoretically tested.


private static volatile ReplicationLogReplayService instance;

private final Configuration conf;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two test-only constructors in ReplicationLogReplayService do not initialize the blank-final field conf. This will break the build.

ReplicationLogReplayService.java:116: error: variable conf might not have been initialized
ReplicationLogReplayService.java:121: error: variable conf might not have been initialized


private ReplicationLogReplayService(final Configuration conf) {
this.conf = conf;
this.cachedConsistencyPoint = Suppliers.memoizeWithExpiration(() -> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppliers.memoizeWithExpiration uses synchronized(this) to serialize loads, but that lock is on the supplier object. This establishes no happens-before relationship with the replay thread's writes to lastRoundInSync.

So you may end up advancing the consistency point too far back, or even getting a a null back from the supplier even after the replay thread has already set it, spuriously throwing an IOException. The net effect is sometimes a major compaction won't drop delete markers when it should.

Something needs to enforce the happens-before relationship. Perhaps volatile ReplicationRound lastRoundInSync , or consider AtomicReference<ReplicationRound>. Or redo this.

@apurtell apurtell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the substantiative issues have been addressed. There is one nit I'd like to see improved but this is looking good to me, so I have approved it.

}

private ReplicationLogReplayService(long fixedConsistencyPoint) {
this.conf = null;

@apurtell apurtell Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need a conf to vary our behavior, it should be passed in by a higher layer. If we need a conf because some API we call needs it, we should create it with new Configuration().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The robot says

Any code path on the test singleton that touches conf will NPE:

start() reads conf.getBoolean(...) (line 155)
stop() reads conf.getBoolean(...) (line 190)
getReplicationGroups() calls HAGroupStoreManager.getInstance(conf) (line 295)
getReplicationLogReplay(...) calls ReplicationLogReplay.get(conf, ...) (line 299)

Today this is safe only because the test ITs and UTs go straight through cachedConsistencyPoint.get() and never call the lifecycle methods.

}

private ReplicationLogReplayService(Supplier<Long> supplier) {
this.conf = null;

@apurtell apurtell Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need a conf to vary our behavior, it should be passed in by a higher layer. If we need a conf because some API we call needs it, we should create it with new Configuration().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Himanshu Gwalani added 2 commits June 24, 2026 17:45
…gging, and trace improvements

- Replace magic 0L and Long.MAX_VALUE with CONSISTENCY_POINT_UNAVAILABLE and
  CONSISTENCY_POINT_GUARD_DISABLED named constants for clarity
- Make consistency point cache TTL configurable via
  phoenix.replication.compaction.guard.cache.ttl.seconds (default 30s)
- Rate-limit fallback WARN log to once per 60 seconds to prevent log flooding
  during sustained outages
- Add replicationConsistencyPoint and computed maxLookbackWindowStartForRow to
  existing TRACE log line for operational visibility
- Update Javadoc on resolveConsistencyPoint to reflect configurable TTL and
  named constant references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants