Optimization Strategy for Allocating DNs in Rack Awareness #9298
Replies: 1 comment 2 replies
-
I think this makes sense in this given context. However, there might some tradeoffs here. One I can think of is that if we pick the DN with under the rack with highest number of writable DN, we might have some hotspots since client will always pick the same rack (same set of DNs) causing lower write throughput (e.g particular rack). ContainerPlacementPolicy is also used when deciding the datanode to replicate to so it might also cause a lot of replications to this particular rack which takes a lot of the network bandwidth of this rack. Not saying that this is a wrong approach, but we might need provider stronger justifications (e.g. tradeoffs, etc). when designing a new placement policy. For example, we can check the approaches of literature paper like CopySet and some rigorous discussions (such as in HDFS-1094 where they calculated probability the of data loss) on how they evaluate a placement policy. We also need to consider tradeoffs such as data durability, load balancing, etc. However, since container placement policy is pluggable, you can probably create a new ContainerPlacementPolicy first and see how it performs? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
We recently encountered an issue in our production environment and would like to share the details:
Background
Rack awareness is enabled in our cluster.
The overall storage utilization of the cluster is high (above 70%).
The number of machines within each rack is uneven, with significant differences between racks.
Problem Description
Assume a rack contains N DataNodes, and N-1 of them have already reached the storage threshold and are no longer writable. In this situation, the remaining writable DataNode shows a significantly higher iowait compared to the others.
This single DataNode ends up handling a disproportionally large amount of write traffic, which causes a noticeable drop in overall cluster write throughput.
We believe this behavior is caused by the current DataNode selection strategy when rack awareness is enabled.
If N-1 DataNodes in a rack are already full, then whenever the rack selection logic chooses that rack, the only writable DataNode in it will always be selected. This leads to severe load concentration.
Proposal
Have we considered designing an enhanced rack-selection strategy that incorporates an additional factor — the number of writable DataNodes in each rack?
In theory, a rack with more writable DataNodes should have a proportionally higher probability of being selected. This would naturally lead to better load balancing and help avoid situations where a single DataNode becomes a hotspot under high cluster utilization.
Beta Was this translation helpful? Give feedback.
All reactions