Skip to content

DAOS-18972 control: addr_format YAML key for fabric IP family (server + clients)#18484

Open
alexandertimofeyev wants to merge 2 commits into
daos-stack:masterfrom
alexandertimofeyev:daos-ipv6-fabric-addr-format
Open

DAOS-18972 control: addr_format YAML key for fabric IP family (server + clients)#18484
alexandertimofeyev wants to merge 2 commits into
daos-stack:masterfrom
alexandertimofeyev:daos-ipv6-fabric-addr-format

Conversation

@alexandertimofeyev

@alexandertimofeyev alexandertimofeyev commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #18254 (DAOS-18972 cart: configurable address family for fabric init), which exposed the fabric address-family preference as D_ADDR_FORMAT / cio_addr_format (values: unspec, ipv4, ipv6, native). This PR surfaces that knob declaratively and propagates it to attaching clients, so a single config key drives both the engines and every client.

Two commits:

1. addr_format YAML key (server engines)

Adds an addr_format key to daos_server.yml, mirroring the existing fabric_auth_keyD_PROVIDER_AUTH_KEY pattern:

  • engine.FabricConfig.AddrFormat tagged yaml:"addr_format,omitempty" cmdEnv:"D_ADDR_FORMAT". The reflection-based cmdEnv machinery emits D_ADDR_FORMAT=<val> to the engine env only when non-empty, so omitting the key preserves the historical (Mercury-default, IPv4-preferring) behavior — no functional change for existing deployments.
  • Update() propagation, Validate() (rejects unknown values up front and enforces one value per provider for multi-provider configs), a WithFabricAddrFormat() builder, and an exported FabricAddrFormats slice kept in sync with CaRT's crt_str_to_addr_format().

2. Propagate addr_format to attaching clients

Address family is a property of each system's fabric, and a single client/agent may attach to several systems with different families (one IPv4 system, one IPv6 system), so it must travel per-system rather than as a global client setting:

  • engine.FabricConfig.GetAddrFormats() parses the comma-separated value to one entry per provider (mirrors GetProviders/GetInterfaces).
  • server.setupGrpc() injects D_ADDR_FORMAT=<value> into each provider's ClientNetHint env. The client applies hint env vars in dc_mgmt_net_cfg_init() before crt_init(), where the already-merged CaRT D_ADDR_FORMAT handling consumes it. The agent already caches GetAttachInfo per system and copies hint env vars verbatim, so a client attached to multiple systems gets each system's family from that system's own hint — no agent config needed.

No proto or client C changes are required: this reuses the existing env_vars hint field and the merged CaRT support.

Testing

go test ./src/control/server/engine/...GetAddrFormats, FabricConfig validation, and engine command-line env-mapping (D_ADDR_FORMAT=ipv6) all pass.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

The CaRT DAOS-18972 change exposed the fabric address-family preference
as the D_ADDR_FORMAT environment variable / cio_addr_format API field
(values: unspec, ipv4, ipv6, native). Operators currently have to inject
that env var into each engine by hand to bring up an IPv6-only fabric
NIC, which is easy to get wrong and undocumented in the server config.

Surface the same knob declaratively in daos_server.yml as a per-engine
"addr_format" key, mirroring the existing fabric_auth_key ->
D_PROVIDER_AUTH_KEY pattern:

* engine.FabricConfig grows an AddrFormat string field tagged
  `yaml:"addr_format,omitempty" cmdEnv:"D_ADDR_FORMAT"`. The existing
  reflection-based cmdEnv machinery emits D_ADDR_FORMAT=<val> to the
  engine environment only when the field is non-empty, so omitting the
  key preserves the historical (Mercury-default, IPv4-preferring)
  behavior with no functional change for existing deployments.

* FabricConfig.Update() propagates AddrFormat from the server-level
  fabric config to each engine, consistent with the other fabric fields.

* FabricConfig.Validate() rejects unrecognized values up front (rather
  than relying on CaRT's silent fallback) and enforces one value per
  provider for multi-provider configs, matching the fabric_iface rule.
  An empty value remains valid.

* Add a WithFabricAddrFormat() builder and an exported FabricAddrFormats
  slice enumerating the accepted hints, kept in sync with CaRT's
  crt_str_to_addr_format().

* Document the key in utils/config/daos_server.yml and extend the engine
  command-line env mapping test to assert D_ADDR_FORMAT=ipv6 is emitted.

Validated with `go test ./server/engine/...` (env mapping and
FabricConfig validation paths all pass).

Signed-off-by: Alex Timofeyev <atimofeyev@linkedin.com>
@alexandertimofeyev alexandertimofeyev requested review from a team as code owners June 10, 2026 17:54
@github-actions

Copy link
Copy Markdown

Ticket title is 'cart: configurable address family for fabric init'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-18972

The addr_format server config key configures the engine fabric address
family, but DAOS clients (libdaos) initialize their own fabric and must
select the same family to reach a given system. Address family is a
property of each system's fabric, and a single client or agent may attach
to several systems with different families (e.g. one IPv4 system and one
IPv6 system), so the family cannot be a global client setting -- it has
to travel per-system.

Advertise addr_format to clients through the existing per-system
GetAttachInfo network hint: when building each provider's ClientNetHint,
append D_ADDR_FORMAT=<value> to the hint's env_vars. The client applies
hint env_vars in dc_mgmt_net_cfg_init() before crt_init(), where the CaRT
D_ADDR_FORMAT handling (added with the cart half of DAOS-18972) picks it
up. The agent already caches GetAttachInfo per system and copies the hint
env_vars verbatim, so a client attached to multiple systems gets each
system's family from that system's own hint, with no agent config.

* engine.FabricConfig gains GetAddrFormats(), parsing the comma-separated
  addr_format into one entry per provider (mirrors GetProviders /
  GetInterfaces), returning empty when unset.

* server.setupGrpc() derives the per-provider address family and injects
  D_ADDR_FORMAT into that provider's client hint env, copying the shared
  ClientEnvVars slice rather than mutating it. An unset addr_format leaves
  the hint untouched, so existing clients see no change.

* Document the client-propagation behavior in utils/config/daos_server.yml
  and unit-test GetAddrFormats (nil/unset/single/multi/whitespace).

No proto or client C changes are required: this reuses the env_vars hint
field and the already-merged CaRT D_ADDR_FORMAT support.

Validated with `go test ./server/engine/...` (GetAddrFormats and the
existing fabric config/env-mapping tests pass).

Signed-off-by: Alex Timofeyev <atimofeyev@linkedin.com>
@alexandertimofeyev alexandertimofeyev changed the title DAOS-18972 control: add addr_format YAML key for fabric IP family DAOS-18972 control: addr_format YAML key for fabric IP family (server + clients) Jun 10, 2026
@alexandertimofeyev

Copy link
Copy Markdown
Contributor Author

Note for reviewers: per-system granularity and the single-crt_init caveat

Address family is advertised per system via each system's GetAttachInfo ClientNetHint, and the agent caches GetAttachInfo per system (infocache.go) and forwards the hint env_vars verbatim. So an agent/client attached to several systems with different families (e.g. one IPv4 system and one IPv6 system) gets the correct D_ADDR_FORMAT for each system from that system's own hint — there is intentionally no global client/agent-side addr_format knob, which avoids a client/server family mismatch.

One inherent limitation, unchanged by this PR: a single client process initializes CaRT once per provider, so cio_addr_format (like cio_provider and cio_crt_timeout, which already come from the hint) is effectively first-attach-wins within one process. If one process attaches to two systems that share a provider but use different address families, the second attach can't re-init CaRT with a different family. This matches the existing behavior for provider/crt_timeout and isn't a regression; the per-system hint is the right granularity regardless.

Design note: why env_vars rather than a dedicated ClientNetHint field

The more idiomatic wiring would be a dedicated ClientNetHint.addr_format field populated per provider on the server, consumed on the client by setting crt_info->cio_addr_format in dc_mgmt_net_cfg_init() (symmetric with how info->providercio_provider). I went with the existing env_vars passthrough instead because it needs no proto/C regeneration and is functionally complete through the already-merged CaRT D_ADDR_FORMAT support (#18254): the client setenvs hint env_vars before crt_init(), which reads D_ADDR_FORMAT. Happy to switch to a dedicated proto field if maintainers prefer that for discoverability/symmetry — it's a mechanical change on top of this.

}
}

func TestFabricConfig_GetAddrFormats(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any validation testing against improper inputs, could we have some of that please?

// dc_mgmt_net_cfg_init). A client/agent attached to several systems gets
// each system's family from that system's own hint. Empty addr_format
// leaves the hint untouched, preserving the historical default.
addrFormats := srv.cfg.Fabric.GetAddrFormats()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change require any other dependent changes to offer functionality? what is the order of landing for all dependencies and related changes?


clientNetHints := make([]*mgmtpb.ClientNetHint, 0, len(providers))
for i, p := range providers {
envVars := srv.cfg.ClientEnvVars

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like an odd place to update ClientEnvVars, could you have a look at existing precedent for updating this within the config processing and validation workflow. there may be a more appropriate place to update this value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants