HDDS-14255. [Website v2] [Docs] [Core Concepts] Consistency Guarantee#305
HDDS-14255. [Website v2] [Docs] [Core Concepts] Consistency Guarantee#305peterxcli wants to merge 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “Consistency Guarantees” documentation page to explain consistency/HA behaviors across OM, SCM, and DN components, including upcoming OM read-consistency options.
Changes:
- Introduces an OM HA consistency section describing default vs optional linearizable reads and follower-read optimizations.
- Documents SCM HA consistency and contrasts it with OM HA.
- Adds a DN ContainerStateMachine consistency explanation with a mermaid diagram and BCSID notes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ## OM (Ozone Manager) HA Consistency | ||
|
|
||
| :::info | ||
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. |
There was a problem hiding this comment.
The parenthetical "(current is 2.1.0)" will become outdated as soon as a new release ships and will require ongoing doc churn. Consider removing the "current is …" portion and phrasing this in terms of version ranges only (e.g., "Prior to Ozone 2.2.0…").
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. | |
| Notice: Before Ozone 2.2.0, all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. |
| ### Default Configuration (Non-Linearizable) (will release in Ozone 2.2) | ||
| - **Read Path**: Only the leader serves read requests | ||
| - **Mechanism**: Reads query the state machine directly without ReadIndex | ||
| - **Guarantee**: **Non-linearizable** - may return stale data during leader transitions | ||
| - **Performance**: No heartbeat rounds required for reads, better latency | ||
| - **Risk**: Short-period split-brain scenario possible (old leader may serve stale reads during leadership transition) | ||
|
|
||
| ### Optional: Linearizable Reads (will release in Ozone 2.2) |
There was a problem hiding this comment.
Version formatting and tense are inconsistent here ("2.2" vs "2.2.0" elsewhere, and repeated "will release" phrasing). To reduce future maintenance and ambiguity, consider using a consistent semantic version (e.g., 2.2.0) and wording like "Starting in Ozone 2.2.0" instead of "will release" in headings.
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. | ||
| ::: | ||
|
|
||
| ### Default Configuration (Non-Linearizable) (will release in Ozone 2.2) |
There was a problem hiding this comment.
The capitalization is inconsistent between the heading ("Non-Linearizable") and the bullet ("Non-linearizable"). Pick one form and use it consistently throughout this page for easier scanning/searching.
| ### Default Configuration (Non-Linearizable) (will release in Ozone 2.2) | |
| ### Default Configuration (Non-linearizable) (will release in Ozone 2.2) |
chungen0126
left a comment
There was a problem hiding this comment.
Thanks @peterxcli for the patch. I have a question for Consistency.
Could you clarify the priority between the 'ozone.om.allow.leader.skip.linearizable.read' and 'ozone.om.follower.read.local.lease.enabled' properties? Before reading the code, I thought the former had a higher priority than the latter. However, the code logic seems that the latter comes first when it was set to true. See:
My suggestion is that, regardless of the specifics, the documentation should clearly state how to configure settings for achieving strong consistency or optimizing performance.
| ### Advanced Read Optimizations | ||
|
|
||
| #### Follower Read with Local Lease (will release in Ozone 2.2) | ||
| - Config: `ozone.om.follower.read.local.lease.enabled=false` (default) |
There was a problem hiding this comment.
This is now set to 'true' by default.
| - **Mechanism**: Reads query the state machine directly without ReadIndex | ||
| - **Guarantee**: **Non-linearizable** - may return stale data during leader transitions | ||
| - **Performance**: No heartbeat rounds required for reads, better latency | ||
| - **Risk**: Short-period split-brain scenario possible (old leader may serve stale reads during leadership transition) |
There was a problem hiding this comment.
it would be nice to have a write up what are the conditions when split brain might happen (assuming that every leader election/transition do not cause split brain)
There was a problem hiding this comment.
read does not require consensus, so it's possible that during network partitioning or a stale leader where it was so slow it lost leadership.
Split brain is not possible for writes.
| - **Mechanism**: Uses Raft ReadIndex (Raft section 6.4) | ||
| - **Guarantee**: Linearizability - reads reflect all committed writes | ||
| - **Trade-off**: Requires leader to confirm leadership via heartbeat rounds | ||
| - **Benefit**: Both the leader and followers can serve reads |
There was a problem hiding this comment.
Do existing clients utilize this capability? How clients pick the service where request is going to be sent?
| ## OM (Ozone Manager) HA Consistency | ||
|
|
||
| :::info | ||
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. |
There was a problem hiding this comment.
I'm not sure if the follower read feature will make it to 2.2.0. IMO I'm more inclined to only write a user doc when the feature is sure to be included.
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. | |
| Notice: Before Ozone 2.2.0 , all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. |
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. | ||
| ::: | ||
|
|
||
| ### Default Configuration (Non-Linearizable) (will release in Ozone 2.2) |
There was a problem hiding this comment.
Isn't the behavior prior to HDDS-14424 consistent with the non-linerazable case?
| ### Default Configuration (Non-Linearizable) (will release in Ozone 2.2) | |
| ### Default Configuration (Non-Linearizable) |
| ## OM (Ozone Manager) HA Consistency | ||
|
|
||
| :::info | ||
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. |
There was a problem hiding this comment.
I'm not sure if the follower read feature will make it to 2.2.0. IMO I'm more inclined to only write a user doc when the feature is sure to be included.
| Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. | |
| Notice: Before Ozone 2.2.0 , all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness. |
There was a problem hiding this comment.
+1 on this, it will be good to have some kind of versioning on the docs and write the current docs based on the current version (2.1.0) instead of the future version (for example follower read API is evolving and still not finalized). Something like a drop-down that will switch the doc versions based on the version (e.g. we default to the current version 2.1.0). When we want to release a new version (e.g. 2.2.0) we can then port the previous version 2.1.0 to the new docs and update the docs based on the changes in the new version.
|
PSA: Please switch PR target branch to |
What changes were proposed in this pull request?
What is the link to the Apache Jira?
https://issues.apache.org/jira/browse/HDDS-14255
How was this patch tested?