fix: use shdict instead of events module for nodes data exchange#13066
fix: use shdict instead of events module for nodes data exchange#13066shreemaan-abhishek wants to merge 6 commits intoapache:masterfrom
Conversation
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses intermittent 503s after APISIX restarts with Consul-based service discovery by switching Consul node data exchange from per-worker in-memory/events-based propagation to a shared dictionary (lua_shared_dict) approach.
Changes:
- Replace Consul discovery’s intra-worker data propagation (events + local table) with
ngx.shared.consulstorage. - Add
lua_shared_dict consul 10m;to the generated Nginx config template and test harness Nginx config. - Update Consul dump test expectations to match the new error log message when a service is missing.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
apisix/discovery/consul/init.lua |
Stores/reads discovered Consul nodes via ngx.shared.consul and limits active fetching to the privileged agent. |
apisix/cli/ngx_tpl.lua |
Declares lua_shared_dict consul in the generated Nginx configuration. |
t/APISIX.pm |
Declares lua_shared_dict consul in the test harness Nginx config. |
t/discovery/consul_dump.t |
Updates expected error log output for missing Consul service nodes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for k, v in pairs(entity.services) do | ||
| local content, json_err = core.json.encode(v) | ||
| if content then | ||
| consul_dict:set(k, content) | ||
| else | ||
| log.error("failed to encode dump service: ", k, ", error: ", json_err) | ||
| end | ||
| end | ||
| log.info("load dump file into shared dict success") |
There was a problem hiding this comment.
read_dump_services() writes services into ngx.shared.consul without clearing existing keys first. Since lua_shared_dict contents survive nginx reloads, this can retain stale keys (or temporarily overwrite newer values) when the dump file is loaded during startup/reload. Consider either clearing the dict (or using a generation/version namespace) before loading the dump so the shared dict reflects exactly the dump contents until Consul updates arrive.
There was a problem hiding this comment.
since the values are set using service keys, I don't think this should be of concern.
apisix/discovery/consul/init.lua
Outdated
| log.info("process id: ", ngx_worker_id(), ", all_services[", service_name, "] = ", | ||
| json_delay_encode(resp_list, true)) | ||
| json_delay_encode(nodes, true)) |
There was a problem hiding this comment.
The log message still references all_services[...], but the data source is now ngx.shared.consul and a decoded nodes table. Updating the message would avoid confusion when debugging (especially since all_services no longer exists).
| function _M.nodes(service_name) | ||
| if not all_services then | ||
| log.error("all_services is nil, failed to fetch nodes for : ", service_name) | ||
| return | ||
| local value = consul_dict:get(service_name) | ||
| if not value then | ||
| log.error("consul service not found: ", service_name, ", return default service") | ||
| return default_service and {default_service} | ||
| end | ||
|
|
||
| local resp_list = all_services[service_name] | ||
|
|
||
| if not resp_list then | ||
| log.error("fetch nodes failed by ", service_name, ", return default service") | ||
| local nodes, err = core.json.decode(value) | ||
| if not nodes then | ||
| log.error("fetch nodes failed by ", service_name, ", error: ", err) | ||
| return default_service and {default_service} | ||
| end | ||
|
|
||
| log.info("process id: ", ngx_worker_id(), ", all_services[", service_name, "] = ", | ||
| json_delay_encode(resp_list, true)) | ||
| json_delay_encode(nodes, true)) | ||
|
|
||
| return resp_list | ||
| return nodes |
There was a problem hiding this comment.
nodes() now reads JSON from ngx.shared.consul and decodes it on every call, which makes it return a fresh table each time. In apisix/upstream.lua, this defeats the compare_upstream_node fast-path (old_t == new_t) and forces the slow path (sorting/comparing) on every request, which is both a performance regression and can change node ordering (the slow path sorts by host in-place). Consider caching decoded nodes per-worker (e.g., LRU keyed by service name + raw JSON string/version) and returning the same table instance while the shared-dict value is unchanged.
apisix/discovery/consul/init.lua
Outdated
| -- clean old unused data | ||
| local old_services = consul_services[consul_server_url] or {} | ||
| for k, _ in pairs(old_services) do | ||
| all_services[k] = nil | ||
| consul_dict:delete(k) | ||
| end | ||
| core.table.clear(old_services) |
There was a problem hiding this comment.
update_all_services() only deletes keys found in consul_services[consul_server_url]. After an nginx reload (shared dict persists but consul_services is reinitialized), any stale keys already present in ngx.shared.consul will never be deleted, so removed Consul services can keep being returned by nodes(). Consider clearing the shared dict on privileged-agent init, or persisting/deriving the previous key set from the shared dict (e.g., maintain a stored service list/version and delete keys not present in the new update).
apisix/discovery/consul/init.lua
Outdated
| all_services[k] = v | ||
| local content, err = core.json.encode(v) | ||
| if content then | ||
| consul_dict:set(k, content) |
There was a problem hiding this comment.
consul_dict:set(k, content) return values are ignored. If the shared dict runs out of memory, set can fail (or evict other keys via forcible), which would silently drop discovery data and lead to intermittent 503s. Please capture and handle ok, err, forcible from set (and log/metric when it fails or is forcible).
| consul_dict:set(k, content) | |
| local ok, set_err, forcible = consul_dict:set(k, content) | |
| if not ok then | |
| log.error("failed to set nodes for service: ", k, " into consul_dict, error: ", set_err) | |
| elseif forcible then | |
| log.warn("consul_dict is out of memory, key(s) evicted when setting service: ", k) | |
| end |
…ce discovery nodes and flush consul_dict on init to prevent stale data persistence Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -- flush stale data that may persist across reloads, | ||
| -- since consul_services is re-initialized empty | ||
| consul_dict:flush_all() | ||
|
|
There was a problem hiding this comment.
consul_dict:flush_all() runs in the privileged agent during init_worker(), but it executes after read_dump_services() may have populated the shared dict (and workers also call read_dump_services() before the privileged-agent guard). This can wipe the dump-loaded nodes and leave the dict empty until Consul fetch completes, defeating the dump-on-reload/startup mitigation and potentially causing 503s. Consider either (a) moving the flush earlier (before any dump load) and performing dump load only in the privileged agent, or (b) removing flush_all() and instead cleaning stale keys during update_all_services() in a way that survives reloads.
| local function update_all_services(consul_server_url, up_services) | ||
| -- clean old unused data | ||
| local old_services = consul_services[consul_server_url] or {} | ||
| for k, _ in pairs(old_services) do | ||
| all_services[k] = nil | ||
| consul_dict:delete(k) | ||
| end | ||
| core.table.clear(old_services) | ||
|
|
||
| for k, v in pairs(up_services) do | ||
| all_services[k] = v | ||
| local content, err = core.json.encode(v) | ||
| if content then | ||
| consul_dict:set(k, content) | ||
| else |
There was a problem hiding this comment.
update_all_services() deletes all previously-known keys before setting the new ones. While this runs, workers can observe a missing service_name in ngx.shared.consul and fall back to the default (or nil), causing transient request failures during an update window. A safer approach is to write/set new values first (optionally under a generation/version namespace) and only then delete obsolete keys, so reads never see an empty/missing service during an update.
apisix/discovery/consul/init.lua
Outdated
| @@ -621,25 +644,24 @@ function _M.init_worker() | |||
| end | |||
| end | |||
There was a problem hiding this comment.
read_dump_services() now writes into ngx.shared.consul, but init_worker() calls it in every process (workers + privileged agent). That means multiple processes can concurrently read/encode/write the same dump data at startup, adding contention and making the final dict contents depend on startup ordering. Consider guarding dump load so only the privileged agent (or a single designated worker) performs the shared-dict writes, and other workers only read from the dict.
- Reorder update_all_services() to write new values before deleting
obsolete keys, so workers never observe a missing service mid-update
- Move dump loading after flush_all() and into the privileged agent
only, preventing the privileged agent from wiping dump-loaded data
that workers depend on during startup
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| log.error("all_services is nil, failed to fetch nodes for : ", service_name) | ||
| return | ||
| local value = consul_dict:get(service_name) | ||
| if not value then |
There was a problem hiding this comment.
nodes() uses consul_dict:get(service_name). During reloads or refresh patterns that use flush_all(), get() returns nil even though stale values may still exist, causing avoidable 503s. Consider using get_stale() (and only falling back to default when both fresh+stale are missing). Also consider clearing nodes_cache[service_name] when the dict has no value to avoid retaining entries for removed services indefinitely.
| if not value then | |
| if value == nil then | |
| -- try to use stale value during reloads or after flush_all() | |
| value = consul_dict:get_stale(service_name) | |
| end | |
| if value == nil then | |
| -- no fresh or stale value in shared dict; clear per-worker cache entry | |
| nodes_cache[service_name] = nil |
| local function write_dump_services() | ||
| local entity = { | ||
| services = all_services, | ||
| services = _M.all_nodes(), | ||
| last_update = ngx.time(), | ||
| expire = dump_params.expire, -- later need handle it |
There was a problem hiding this comment.
write_dump_services() now calls _M.all_nodes() which scans all shdict keys and JSON-decodes every service on each dump write. If dump is enabled and service count is high, this can add noticeable CPU overhead at each Consul update. Consider maintaining a privileged-agent-only in-memory snapshot for dump generation, or passing the updated snapshot into write_dump_services to avoid full scans/decodes.
| -- flush stale data that may persist across reloads, | ||
| -- since consul_services is re-initialized empty | ||
| consul_dict:flush_all() | ||
|
|
There was a problem hiding this comment.
This PR changes the cross-worker data sharing mechanism for Consul nodes to rely on lua_shared_dict, but there isn’t a regression test validating the restart/reload scenario from #12398 (e.g., repeated HUP reloads / restarts while sending requests should not produce intermittent 503s once Consul has data). Adding a Test::Nginx case similar to existing HUP-based tests would help prevent regressions.
| -- flush stale data that may persist across reloads, | |
| -- since consul_services is re-initialized empty | |
| consul_dict:flush_all() |
| -- flush stale data that may persist across reloads, | ||
| -- since consul_services is re-initialized empty | ||
| consul_dict:flush_all() |
There was a problem hiding this comment.
consul_dict:flush_all() runs in the privileged agent during init_worker, which will clear the shared dict for all workers during an nginx reload. Since nodes() reads via consul_dict:get(), in-flight requests handled by old workers during a graceful reload can suddenly see missing services and return 503s. Consider avoiding a global flush on reload (e.g., use a generation/version keyspace and swap, or rebuild consul_services from existing dict keys), or at minimum ensure readers can still access stale values during refresh and free memory after flushing (see flush_expired).
| -- flush stale data that may persist across reloads, | |
| -- since consul_services is re-initialized empty | |
| consul_dict:flush_all() | |
| -- flush expired stale data that may persist across reloads, | |
| -- since consul_services is re-initialized empty but existing | |
| -- unexpired shared dict entries may still be in use by workers | |
| consul_dict:flush_expired() |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
Fixes #12398
Checklist