[10.2.x] ATS Configuration Reload with observability/tracing - Token model (#12892)#13354
[10.2.x] ATS Configuration Reload with observability/tracing - Token model (#12892)#13354masaori335 wants to merge 1 commit into
Conversation
…ache#12892) ATS Configuration Reload with observability/tracing — Token model Replace the fire-and-forget configuration reload mechanism with a new token-based, observable reload framework. Every reload operation is now assigned a unique token, tracked through a task tree, and queryable via CLI or JSONRPC at any point after submission. Core components introduced: - ConfigRegistry: centralized singleton for config file registration, filename records, trigger records, and reload handlers. Replaces the scattered registration across AddConfigFilesHere.cc and individual modules. - ReloadCoordinator: manages reload session lifecycle including token generation, concurrency control (--force to override), timeout detection, and rolling history. - ConfigReloadTask: tracks a single reload as a tree of sub-tasks with per-handler status, timings, and logs. - ConfigContext: lightweight context passed to handlers providing in_progress(), complete(), fail(), log(), supplied_yaml(), and add_dependent_ctx(). Safe no-op at startup when no reload is active. - ConfigReloadProgress: periodic checker that detects stuck tasks and marks them as TIMEOUT. New traffic_ctl commands: - config reload [-m] [-t <token>] [-d @file] [--force] - config status [-t <token>] [-c all] All commands support --format json for automation and CI pipelines. New JSONRPC APIs: - admin_config_reload: unified file-based or inline reload with token, force, and configs parameters. - get_reload_config_status: query reload status by token or get the last N reloads. Migrated config handlers to ConfigRegistry: ip_allow, cache_control, cache_hosting, parent_proxy, split_dns, remap, logging, ssl_client_coordinator (with sni.yaml and ssl_multicert.config as dependencies), ssl_ticket_key, records, and pre-warm. Static configs (storage, volume, plugin, socks, jsonrpc) registered as inventory-only. Removed legacy ConfigUpdateHandler/ConfigUpdateContinuation from ConfigProcessor.h. Removed AddConfigFilesHere.cc in favor of per-module self-registration. Fixed duplicate handler execution for configs with multiple trigger records (e.g. ssl_client_coordinator) by deduplicating against the ConfigReloadTask subtask tree. Added RecFlushConfigUpdateCbs() to synchronously fire pending record callbacks after rereadConfig(), ensuring all subtasks are registered before the first status poll. New configuration records: - proxy.config.admin.reload.timeout (default: 1h) - proxy.config.admin.reload.check_interval (default: 2s) Backward compatible: existing `traffic_ctl config reload` works as before; internally it now uses the new framework with automatic token assignment and tracking. (cherry picked from commit 5bab268) Conflicts: include/tscore/ArgParser.h src/iocore/cache/P_CacheHosting.h src/iocore/hostdb/CMakeLists.txt src/proxy/ReverseProxy.cc src/records/CMakeLists.txt src/tscore/ArgParser.cc
There was a problem hiding this comment.
Pull request overview
This PR backports the token-based, observable configuration reload framework (originally #12892), replacing the prior fire-and-forget reload behavior with a tracked reload task tree that can be monitored via traffic_ctl and JSONRPC.
Changes:
- Introduces
ConfigRegistry+ReloadCoordinator+ConfigContextand wires core modules to self-register reload handlers with tokenized task tracking. - Adds/updates JSONRPC endpoints (
admin_config_reload,get_reload_config_status) and extendstraffic_ctl config reload/statusto support tokens, monitor/details, inline YAML, and force. - Adds extensive AuTest + unit test coverage for reload lifecycle, deduplication, reserve-subtask behavior, and handler completion/timeout behavior.
Reviewed changes
Copilot reviewed 95 out of 96 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gold_tests/traffic_ctl/traffic_ctl_test_utils.py | Adds test helpers for traffic_ctl config reload/status and configurable expected return codes. |
| tests/gold_tests/traffic_ctl/traffic_ctl_config_reload.test.py | New gold test coverage for tokenized traffic_ctl config reload/status behaviors. |
| tests/gold_tests/tls/tls_client_cert_plugin.test.py | Adjusts readiness expectation count for SNI reload logging. |
| tests/gold_tests/remap/remap_reload.test.py | Enables debug tags relevant to reload tracing for remap reload tests. |
| tests/gold_tests/parent_config/parent_config_reload.test.py | New test validating parent.config reload via file touch + record-trigger. |
| tests/gold_tests/jsonrpc/jsonrpc_api_schema.test.py | Temporarily disables schema assertion for admin_config_reload response. |
| tests/gold_tests/jsonrpc/json/admin_detached_config_reload_req.json | New JSONRPC request fixture for admin_config_reload. |
| tests/gold_tests/jsonrpc/config_reload_tracking.test.py | New JSONRPC test for token generation/history/basic status querying. |
| tests/gold_tests/jsonrpc/config_reload_reserve_subtask.test.py | New test for reserve_subtask() behavior when records completes first. |
| tests/gold_tests/jsonrpc/config_reload_full_smoke.test.py | New smoke test touching all registered configs + record-trigger reloads. |
| tests/gold_tests/jsonrpc/config_reload_dedup.test.py | New deduplication test for multi-trigger ssl client coordinator paths. |
| tests/gold_tests/ip_allow/ip_category.test.py | Extends debug tags to include config.reload for test observability. |
| tests/gold_tests/ip_allow/ip_allow_reload_triggered.test.py | New functional test for ip_allow + ip_categories dependency reload behavior. |
| tests/gold_tests/dns/splitdns_reload.test.py | New test validating splitdns reload handler invocation. |
| tests/gold_tests/cache/cache_config_reload.test.py | New test validating cache.config + hosting.config reload via registry. |
| src/traffic_server/traffic_server.cc | Registers records.yaml handler + static inventory files; adapts plugin callback registration. |
| src/traffic_server/RpcAdminPubHandlers.cc | Registers new reload-related JSONRPC methods. |
| src/traffic_server/CMakeLists.txt | Adjusts link ordering to include configmanager. |
| src/traffic_logstats/CMakeLists.txt | Adds missing linkage to records/configmanager for new dependencies. |
| src/traffic_ctl/TrafficCtlStatus.h | Adds CTRL_EX_TEMPFAIL (75) for in-progress/temporary failures. |
| src/traffic_ctl/traffic_ctl.cc | Expands config reload/status CLI options for token/monitor/details/inline YAML. |
| src/traffic_ctl/jsonrpc/CtrlRPCRequests.h | Adds structured reload/status request/response models (incl. YAML Node configs). |
| src/traffic_ctl/jsonrpc/ctrl_yaml_codecs.h | Adds YAML codecs for reload/status requests and responses. |
| src/traffic_ctl/CtrlPrinters.h | Adds printing helpers for reload task tree and progress line rendering. |
| src/traffic_ctl/CtrlPrinters.cc | Implements reload progress bar + task tree reporting; adjusts JSON output behavior. |
| src/traffic_ctl/CtrlCommands.h | Adds helpers for reload/status tracking, monitoring loop, and inline data loading. |
| src/records/unit_tests/test_ConfigReloadTask.cc | New unit tests for reload task state/timeouts/stale behavior. |
| src/records/unit_tests/test_ConfigRegistry.cc | New unit tests for registry resolve + dependency key routing/dedup. |
| src/records/RecordsConfig.cc | Adds new dynamic records for reload timeout/check interval. |
| src/records/RecCore.cc | Plumbs ConfigContext into unregistered-record warnings for reload logging. |
| src/records/P_RecCore.cc | Adds RecFlushConfigUpdateCbs() to flush pending record callbacks synchronously. |
| src/records/CMakeLists.txt | Links reload infrastructure into records and adds new unit tests. |
| src/proxy/ReverseProxy.cc | Registers remap reload handler via ConfigRegistry; updates reloadUrlRewrite signature. |
| src/proxy/ParentSelection.cc | Migrates parent.config reload to ConfigRegistry and adds ctx completion logging. |
| src/proxy/logging/LogConfig.cc | Migrates logging reload triggers to registry and threads ConfigContext through deferred reload. |
| src/proxy/IPAllow.cc | Migrates ip_allow + ip_categories dependency tracking to registry with ctx status reporting. |
| src/proxy/http2/CMakeLists.txt | Links configmanager where needed due to new config reload components. |
| src/proxy/http/remap/unit-tests/CMakeLists.txt | Links configmanager and adds non-Apple multiple-definition workaround. |
| src/proxy/http/PreWarmConfig.cc | Migrates record-triggered prewarm config reload to registry with ctx completion. |
| src/proxy/hdrs/CMakeLists.txt | Links configmanager for updated dependencies. |
| src/proxy/CMakeLists.txt | Updates proxy link dependencies to include http/configmanager components. |
| src/proxy/CacheControl.cc | Migrates cache_control reload to registry with ctx completion. |
| src/mgmt/rpc/handlers/config/Configuration.cc | Implements unified tokenized reload (file vs inline configs) + status/history JSONRPC. |
| src/mgmt/rpc/CMakeLists.txt | Links configmanager in JSONRPC server unit tests. |
| src/mgmt/config/ReloadCoordinator.cc | New coordinator managing reload lifecycle, history, concurrency, and subtask reservation. |
| src/mgmt/config/FileManager.cc | Routes records reload through registry and modernizes plugin callback storage. |
| src/mgmt/config/ConfigReloadExecutor.cc | New ET_TASK continuation to run reload work and flush record callbacks. |
| src/mgmt/config/ConfigContext.cc | New context implementation for progress/logging + injected YAML propagation to dependents. |
| src/mgmt/config/CMakeLists.txt | Rebuilds configmanager library composition and dependencies around new components. |
| src/mgmt/config/AddConfigFilesHere.cc | Removes legacy centralized config file registration. |
| src/iocore/net/SSLSNIConfig.cc | Threads ConfigContext into SNI reload path for task tracking. |
| src/iocore/net/SSLConfig.cc | Threads ConfigContext into SSL reload paths; migrates ssl_ticket_key to registry record triggers. |
| src/iocore/net/SSLClientCoordinator.cc | Migrates SSL coordinator triggers/dependencies to registry and uses dependent contexts for subcomponents. |
| src/iocore/net/QUICMultiCertConfigLoader.cc | Threads ConfigContext into QUIC cert reload. |
| src/iocore/net/quic/QUICConfig.cc | Threads ConfigContext into QUIC config reload. |
| src/iocore/net/P_SSLConfig.h | Updates SSL API signatures to accept optional ConfigContext. |
| src/iocore/net/P_SSLClientCoordinator.h | Updates coordinator API signature and includes ConfigContext. |
| src/iocore/eventsystem/RecProcess.cc | Moves config update debug logging under a dedicated dbg_ctl. |
| src/iocore/eventsystem/CMakeLists.txt | Links configmanager in event system unit tests. |
| src/iocore/dns/SplitDNS.cc | Migrates splitdns reload to registry with ctx completion/fail reporting. |
| src/iocore/cache/P_CacheHosting.h | Removes legacy hosting.config callback scaffolding (migrated to registry). |
| src/iocore/cache/CacheHosting.cc | Removes legacy hosting.config callback function. |
| src/iocore/cache/Cache.cc | Registers cache_hosting reload handler late (after cache init) via registry. |
| src/iocore/aio/CMakeLists.txt | Links configmanager in aio unit tests. |
| src/cripts/CMakeLists.txt | Links yaml-cpp for new YAML usage in cripts library. |
| include/shared/rpc/yaml_codecs.h | Extends try_extract to support caller-provided default values. |
| include/records/YAMLConfigReloadTaskEncoder.h | New YAML encoder for reload task snapshots used in JSONRPC responses. |
| include/records/RecCore.h | Exposes RecFlushConfigUpdateCbs() and updates unregistered-record warnings to accept ConfigContext. |
| include/proxy/ReverseProxy.h | Updates remap reload API to accept ConfigContext. |
| include/proxy/ParentSelection.h | Updates parent reconfigure API to accept optional ConfigContext. |
| include/proxy/logging/LogConfig.h | Updates logging reconfigure API and stores reload context on LogConfig. |
| include/proxy/IPAllow.h | Updates ip_allow reconfigure API to accept optional ConfigContext. |
| include/proxy/http/PreWarmConfig.h | Updates prewarm reconfigure API to accept optional ConfigContext. |
| include/proxy/CacheControl.h | Updates cache_control reload API to accept ConfigContext. |
| include/mgmt/rpc/handlers/config/Configuration.h | Documents unified reload API and adds status query declaration. |
| include/mgmt/config/ReloadCoordinator.h | New public interface for reload lifecycle management and subtask APIs. |
| include/mgmt/config/FileManager.h | Updates plugin callback registration signature and removes legacy registry init declaration. |
| include/mgmt/config/ConfigReloadExecutor.h | New API for scheduling async reload work on ET_TASK. |
| include/mgmt/config/ConfigReloadErrors.h | New shared error code definitions for reload lifecycle and validation. |
| include/mgmt/config/ConfigContext.h | New public handler context API for reload status/logging + injected YAML + dependent subtasks. |
| include/iocore/net/SSLSNIConfig.h | Updates SNI reconfigure API to accept optional ConfigContext. |
| include/iocore/net/QUICMultiCertConfigLoader.h | Updates QUIC cert reload API to accept optional ConfigContext. |
| include/iocore/net/quic/QUICConfig.h | Updates QUIC config reload API to accept optional ConfigContext. |
| include/iocore/eventsystem/ConfigProcessor.h | Removes legacy ConfigUpdateHandler/Continuation machinery. |
| include/iocore/dns/SplitDNSProcessor.h | Updates splitdns reconfigure API and removes legacy handler pointer. |
| doc/developer-guide/jsonrpc/jsonrpc-api.en.rst | Documents token-based reload + inline configs + status/history query endpoint. |
| doc/developer-guide/index.en.rst | Adds new developer guide page entry for config reload framework docs. |
| out.created_time = helper::try_extract<std::string>(node, "created_time"); | ||
| for (auto &&msg : node["message"]) { | ||
| out.messages.push_back(msg.as<std::string>()); | ||
| } | ||
| out.config_token = helper::try_extract<std::string>(node, "token"); | ||
|
|
||
| for (auto &&element : node["tasks"]) { | ||
| ConfigReloadResponse::ReloadInfo task = get_info(get_info, element); | ||
| out.tasks.push_back(std::move(task)); | ||
| } |
There was a problem hiding this comment.
This is a false positive for yaml-cpp Iterating a missing key does not throw, it's a safe no-op and a well known idiom.
node here is const, so node["message"] uses the const operator[], which returns a Zombie node when the key is absent (node/impl.h):
| meta["created_time_ms"] = info.created_time_ms; | ||
| meta["last_updated_time_ms"] = info.last_updated_time_ms; | ||
| meta["main_task"] = info.main_task ? "true" : "false"; | ||
|
|
There was a problem hiding this comment.
This doesn't affect the JSONRPC output. The RPC response is serialized with YAML::DoubleQuoted (include/mgmt/rpc/jsonrpc/json/YAMLCodec.h):
YAML::Emitter json;
json << YAML::DoubleQuoted << YAML::Flow;
encode(resp, json);| "errors": [ | ||
| { | ||
| "message": "Reload ongoing with token 'deploy-v2.1'", | ||
| "code": 1 |
| "errors": [ | ||
| { | ||
| "message": "Token 'nonexistent' not found", | ||
| "code": 4 |
brbzull0
left a comment
There was a problem hiding this comment.
beside the red CI. Looks good to me.
Backport #12892
ATS Configuration Reload with observability/tracing — Token model Replace the fire-and-forget configuration reload mechanism with a new token-based, observable reload framework. Every reload operation is now assigned a unique token, tracked through a task tree, and queryable via CLI or JSONRPC at any point after submission.
Core components introduced:
New traffic_ctl commands:
config reload [-m] [-t ] [-d @file] [--force]
config status [-t ] [-c all]
All commands support --format json for automation and CI pipelines.
New JSONRPC APIs:
Migrated config handlers to ConfigRegistry: ip_allow, cache_control, cache_hosting, parent_proxy, split_dns, remap, logging, ssl_client_coordinator (with sni.yaml and ssl_multicert.config as dependencies), ssl_ticket_key, records, and pre-warm. Static configs (storage, volume, plugin, socks, jsonrpc) registered as inventory-only.
Removed legacy ConfigUpdateHandler/ConfigUpdateContinuation from ConfigProcessor.h. Removed AddConfigFilesHere.cc in favor of per-module self-registration.
Fixed duplicate handler execution for configs with multiple trigger records (e.g. ssl_client_coordinator) by deduplicating against the ConfigReloadTask subtask tree.
Added RecFlushConfigUpdateCbs() to synchronously fire pending record callbacks after rereadConfig(), ensuring all subtasks are registered before the first status poll.
New configuration records:
Backward compatible: existing
traffic_ctl config reloadworks as before; internally it now uses the new framework with automatic token assignment and tracking.(cherry picked from commit 5bab268)
Conflicts:
include/tscore/ArgParser.h
src/iocore/cache/P_CacheHosting.h
src/iocore/hostdb/CMakeLists.txt
src/proxy/ReverseProxy.cc
src/records/CMakeLists.txt
src/tscore/ArgParser.cc