Skip to content

HDDS-14764. Allow Datanode to dynamically reconfigure SCM node addresses#9863

Merged
adoroszlai merged 7 commits intoapache:masterfrom
ivandika3:HDDS-14764
Mar 5, 2026
Merged

HDDS-14764. Allow Datanode to dynamically reconfigure SCM node addresses#9863
adoroszlai merged 7 commits intoapache:masterfrom
ivandika3:HDDS-14764

Conversation

@ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented Mar 4, 2026

What changes were proposed in this pull request?

HDDS-13890 supports reconfiguring oozone.scm.nodes."service", but not ozone.scm.address."service"."node" since reconfiguration framework requires reconfigurable properties to specified in advance.

We can support to allow fully zero datanode restarts for SCM reconfiguration.

However, this requires two reconfigurations

  1. ozone.scm.address."service"."node"
  2. ozone.scm.nodes."service"

If both configurations are changed at the same time, the ordering of the reconfiguration is not determined and can cause unexpected behaviors.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14764

How was this patch tested?

Update IT.

Clean CI: https://github.com/ivandika3/ozone/actions/runs/22653730808

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 for the patch.

Comment on lines +583 to +589
if (reconfigurationHandler != null) {
try {
reconfigurationHandler.close();
} catch (IOException e) {
LOG.error("DatanodeReconfigurationHandler stop failed", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find about reconfigurationHandler not being closed. Should we fix it in separate patch, since it affects OM and SCM, too?

nit: can simplify?

import org.apache.hadoop.hdds.utils.IOUtils;

IOUtils.close(LOG, reconfigurationHandler);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll remove it for now. Let's fix it in another patch.

Raised HDDS-14766.

reconfigurationHandler =
new ReconfigurationHandler("DN", conf, this::checkAdminPrivilege)
new DatanodeReconfigurationHandler("DN", conf, this::checkAdminPrivilege)
.registerPrefix(ConfUtils.addKeySuffixes(OZONE_SCM_ADDRESS_KEY, true, scmServiceId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't registerPrefix be called within if (scmServiceId != null) block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. Thanks, updated.

return addSuffix(key, keySuffix);
String suffix = addSuffix(key, keySuffix);
if (withTrailingSeparator) {
suffix = suffix.concat(".");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather append . in DatanodeReconfigurationHandler.registerPrefix:

  • safety: ensure that prefix ends with . instead of relying on caller to add it
  • simplicity: new addKeySuffixes function just to append . seems overkill

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, updated.

@Override
public List<String> listReconfigureProperties() throws IOException {
Set<String> reconfigureSet = new TreeSet<>(super.listReconfigureProperties());
reconfigureSet.addAll(prefixProperties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestDatanodeReconfiguration.reconfigurableProperties needs to be updated.

https://github.com/apache/ozone/actions/runs/22660446735/job/65680352472?pr=9863#step:13:5791

Copy link
Contributor Author

@ivandika3 ivandika3 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently TestDatanodeReconfiguration.reconfigurableProperties has null scmServiceId so it should not show up in the listReconfigureProperties after the recent commits.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 for updating the patch, LGTM, just a question about a possible improvement.

if (scmServiceId != null) {
reconfigurationHandler.register(OZONE_SCM_NODES_KEY + "." + scmServiceId,
this::reconfigScmNodes);
((DatanodeReconfigurationHandler) reconfigurationHandler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reconfigurationHandler be declared as DatanodeReconfigurationHandler to avoid this cast? (Also need to split instantiation and register() calls.) Or even add prefix support in ReconfigurationHandler and skip introducing the new subclass?

Copy link
Contributor Author

@ivandika3 ivandika3 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, since DatanodeReconfigurationHandler evolved to supporting prefix (instead of the initial implementation that's specific to DN), let's just merge it to ReconfigurationHandler

@ivandika3 ivandika3 self-assigned this Mar 5, 2026
@adoroszlai adoroszlai merged commit 4a2c8db into apache:master Mar 5, 2026
88 of 89 checks passed
@adoroszlai
Copy link
Contributor

Thanks @ivandika3 for the patch.

@ivandika3 ivandika3 deleted the HDDS-14764 branch March 5, 2026 07:59
@ivandika3
Copy link
Contributor Author

Thanks @adoroszlai for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants