DAOS-18972 control: addr_format YAML key for fabric IP family (server + clients)#18484
DAOS-18972 control: addr_format YAML key for fabric IP family (server + clients)#18484alexandertimofeyev wants to merge 2 commits into
Conversation
The CaRT DAOS-18972 change exposed the fabric address-family preference as the D_ADDR_FORMAT environment variable / cio_addr_format API field (values: unspec, ipv4, ipv6, native). Operators currently have to inject that env var into each engine by hand to bring up an IPv6-only fabric NIC, which is easy to get wrong and undocumented in the server config. Surface the same knob declaratively in daos_server.yml as a per-engine "addr_format" key, mirroring the existing fabric_auth_key -> D_PROVIDER_AUTH_KEY pattern: * engine.FabricConfig grows an AddrFormat string field tagged `yaml:"addr_format,omitempty" cmdEnv:"D_ADDR_FORMAT"`. The existing reflection-based cmdEnv machinery emits D_ADDR_FORMAT=<val> to the engine environment only when the field is non-empty, so omitting the key preserves the historical (Mercury-default, IPv4-preferring) behavior with no functional change for existing deployments. * FabricConfig.Update() propagates AddrFormat from the server-level fabric config to each engine, consistent with the other fabric fields. * FabricConfig.Validate() rejects unrecognized values up front (rather than relying on CaRT's silent fallback) and enforces one value per provider for multi-provider configs, matching the fabric_iface rule. An empty value remains valid. * Add a WithFabricAddrFormat() builder and an exported FabricAddrFormats slice enumerating the accepted hints, kept in sync with CaRT's crt_str_to_addr_format(). * Document the key in utils/config/daos_server.yml and extend the engine command-line env mapping test to assert D_ADDR_FORMAT=ipv6 is emitted. Validated with `go test ./server/engine/...` (env mapping and FabricConfig validation paths all pass). Signed-off-by: Alex Timofeyev <atimofeyev@linkedin.com>
|
Ticket title is 'cart: configurable address family for fabric init' |
The addr_format server config key configures the engine fabric address family, but DAOS clients (libdaos) initialize their own fabric and must select the same family to reach a given system. Address family is a property of each system's fabric, and a single client or agent may attach to several systems with different families (e.g. one IPv4 system and one IPv6 system), so the family cannot be a global client setting -- it has to travel per-system. Advertise addr_format to clients through the existing per-system GetAttachInfo network hint: when building each provider's ClientNetHint, append D_ADDR_FORMAT=<value> to the hint's env_vars. The client applies hint env_vars in dc_mgmt_net_cfg_init() before crt_init(), where the CaRT D_ADDR_FORMAT handling (added with the cart half of DAOS-18972) picks it up. The agent already caches GetAttachInfo per system and copies the hint env_vars verbatim, so a client attached to multiple systems gets each system's family from that system's own hint, with no agent config. * engine.FabricConfig gains GetAddrFormats(), parsing the comma-separated addr_format into one entry per provider (mirrors GetProviders / GetInterfaces), returning empty when unset. * server.setupGrpc() derives the per-provider address family and injects D_ADDR_FORMAT into that provider's client hint env, copying the shared ClientEnvVars slice rather than mutating it. An unset addr_format leaves the hint untouched, so existing clients see no change. * Document the client-propagation behavior in utils/config/daos_server.yml and unit-test GetAddrFormats (nil/unset/single/multi/whitespace). No proto or client C changes are required: this reuses the env_vars hint field and the already-merged CaRT D_ADDR_FORMAT support. Validated with `go test ./server/engine/...` (GetAddrFormats and the existing fabric config/env-mapping tests pass). Signed-off-by: Alex Timofeyev <atimofeyev@linkedin.com>
Note for reviewers: per-system granularity and the single-
|
| } | ||
| } | ||
|
|
||
| func TestFabricConfig_GetAddrFormats(t *testing.T) { |
There was a problem hiding this comment.
I don't see any validation testing against improper inputs, could we have some of that please?
| // dc_mgmt_net_cfg_init). A client/agent attached to several systems gets | ||
| // each system's family from that system's own hint. Empty addr_format | ||
| // leaves the hint untouched, preserving the historical default. | ||
| addrFormats := srv.cfg.Fabric.GetAddrFormats() |
There was a problem hiding this comment.
does this change require any other dependent changes to offer functionality? what is the order of landing for all dependencies and related changes?
|
|
||
| clientNetHints := make([]*mgmtpb.ClientNetHint, 0, len(providers)) | ||
| for i, p := range providers { | ||
| envVars := srv.cfg.ClientEnvVars |
There was a problem hiding this comment.
this looks like an odd place to update ClientEnvVars, could you have a look at existing precedent for updating this within the config processing and validation workflow. there may be a more appropriate place to update this value.
Summary
Follow-up to #18254 (DAOS-18972 cart: configurable address family for fabric init), which exposed the fabric address-family preference as
D_ADDR_FORMAT/cio_addr_format(values:unspec,ipv4,ipv6,native). This PR surfaces that knob declaratively and propagates it to attaching clients, so a single config key drives both the engines and every client.Two commits:
1.
addr_formatYAML key (server engines)Adds an
addr_formatkey todaos_server.yml, mirroring the existingfabric_auth_key→D_PROVIDER_AUTH_KEYpattern:engine.FabricConfig.AddrFormattaggedyaml:"addr_format,omitempty" cmdEnv:"D_ADDR_FORMAT". The reflection-basedcmdEnvmachinery emitsD_ADDR_FORMAT=<val>to the engine env only when non-empty, so omitting the key preserves the historical (Mercury-default, IPv4-preferring) behavior — no functional change for existing deployments.Update()propagation,Validate()(rejects unknown values up front and enforces one value per provider for multi-provider configs), aWithFabricAddrFormat()builder, and an exportedFabricAddrFormatsslice kept in sync with CaRT'scrt_str_to_addr_format().2. Propagate
addr_formatto attaching clientsAddress family is a property of each system's fabric, and a single client/agent may attach to several systems with different families (one IPv4 system, one IPv6 system), so it must travel per-system rather than as a global client setting:
engine.FabricConfig.GetAddrFormats()parses the comma-separated value to one entry per provider (mirrorsGetProviders/GetInterfaces).server.setupGrpc()injectsD_ADDR_FORMAT=<value>into each provider'sClientNetHintenv. The client applies hint env vars indc_mgmt_net_cfg_init()beforecrt_init(), where the already-merged CaRTD_ADDR_FORMAThandling consumes it. The agent already cachesGetAttachInfoper system and copies hint env vars verbatim, so a client attached to multiple systems gets each system's family from that system's own hint — no agent config needed.No proto or client C changes are required: this reuses the existing
env_varshint field and the merged CaRT support.Testing
go test ./src/control/server/engine/...—GetAddrFormats,FabricConfigvalidation, and engine command-line env-mapping (D_ADDR_FORMAT=ipv6) all pass.Steps for the author:
After all prior steps are complete: