-
Notifications
You must be signed in to change notification settings - Fork 82
Rewrite the introduction to the Clustering section #2749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Rewrite the introduction to the Clustering section #2749
Conversation
modules/ROOT/pages/clustering/multi-region-deployment/geo-redundant-deployment.adoc
Show resolved
Hide resolved
6b095c3 to
87a6450
Compare
modules/ROOT/pages/clustering/multi-region-deployment/geo-redundant-deployment.adoc
Outdated
Show resolved
Hide resolved
| Servers and databases are decoupled: servers provide computation and storage power for databases to use. | ||
| Each database relies on its own cluster architecture, organized into primaries (with a minimum of three for high availability) and secondaries (for read scaling). | ||
| . *Fault tolerance:* Primary database allocations provide a fault tolerant platform for transaction processing. | ||
| A database remains available for reads and writes as long as a simple majority of its primary copies are functioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copies -> allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm depends actually, are we using copy/allocation interchangeably elsewhere? In which ignore this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to use database allocation everywhere though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we're using copies/allocations interchangeably. In the Disaster recovery, however, we decided to use allocations only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like the term allocation, but I will leave it at your discretion :)
87a6450 to
af0d43a
Compare
|
This PR includes documentation updates Updated pages: |
| . *Safety:* Servers hosting databases in primary mode provide a fault tolerant platform for transaction processing which remains available while a simple majority of those Primary Servers are functioning. | ||
| . *Scale:* Servers hosting databases in secondary mode provide a massively scalable platform for graph queries that enables very large graph workloads to be executed in a widely distributed topology. | ||
| . *Causal consistency:* When invoked, a client application is guaranteed to read at least its own writes. | ||
| . *Scalability:* A Neo4j cluster is a highly available cluster with multi-database support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying a cluster is a cluster feels a bit redundant, what do you think?
| . *Scalability:* A Neo4j cluster is a highly available cluster with multi-database support. | ||
| It is a set of servers running a number of databases. | ||
| Servers and databases are decoupled: servers provide computation and storage power for databases to use. | ||
| Each database has it own independent topology, organized into primaries (with a minimum of three for high availability) and secondaries (for read scaling). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"with a minimum of three for high availability" could read as if you have to always have at least 3 primaries. I think this is important information but could be moved elsewhere
| Servers and databases are decoupled: servers provide computation and storage power for databases to use. | ||
| Each database has it own independent topology, organized into primaries (with a minimum of three for high availability) and secondaries (for read scaling). | ||
| . *Fault tolerance:* Primary database allocations provide a fault tolerant platform for transaction processing. | ||
| A database remains available for reads and writes as long as a simple majority of its primary allocations are functioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"simple majority of its primary allocations are functioning" is true for writes, but reads do not require this
| Generally speaking, fault tolerance is the number of primary database copies you can lose without affecting a certain operation. | ||
| For instance, with one primary copy, you have no fault tolerance, because if it goes offline, nothing is available. | ||
| If you have two servers, each with a primary copy of the database, and one goes offline, the other will still have some copy of the data, so read availability would be preserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this block could go earlier before we enumerate the types
|
|
||
| While secondaries serve as a copy of your database, providing some level of durability (what is committed cannot be lost), they do not guarantee it completely. | ||
| Secondaries pull updates from a selected upstream member (primary) on their own schedule. | ||
| Due to their asynchronous nature, secondaries may not provide all transactions committed on the primary allocation(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could re-work this to say there can be a lag between a transaction committed on primaries vs when it will be able on the secondaries
| [[recommended-number-of-secondaries]] | ||
| === How many secondaries you should have | ||
|
|
||
| Secondaries typically provide read scale out, i.e. if you have more read queries happening than your primaries can handle, you can add secondaries to share the load; or even configure the query routing so that reads preferentially target secondaries to leave the primaries free to handle just the write workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read scale out -> read scaling?
Rewrite the Clustering introductory page to include examples of database topologies.
Remove terminology inconsistencies across a few pages.