fix: change consul event module to shared dict#12773
Closed
Baoyuantop wants to merge 4 commits intoapache:masterfrom
Closed
fix: change consul event module to shared dict#12773Baoyuantop wants to merge 4 commits intoapache:masterfrom
Baoyuantop wants to merge 4 commits intoapache:masterfrom
Conversation
nic-6443
reviewed
Dec 9, 2025
Comment on lines
+68
to
+107
| local function persist_all_services_to_shm() | ||
| if not consul_dict then | ||
| return | ||
| end | ||
|
|
||
| local data, err = core.json.encode(all_services) | ||
| if not data then | ||
| log.error("failed to encode consul services for shared dict: ", err) | ||
| return | ||
| end | ||
|
|
||
| local function discovery_consul_callback(data, event, source, pid) | ||
| all_services = data | ||
| log.notice("update local variable all_services, event is: ", event, | ||
| "source: ", source, "server pid:", pid, | ||
| ", all services: ", json_delay_encode(all_services, true)) | ||
| local ok, set_err = consul_dict:set(consul_dict_services_key, data) | ||
| if not ok then | ||
| log.error("failed to store consul services in shared dict: ", set_err) | ||
| return | ||
| end | ||
| end | ||
|
|
||
|
|
||
| local function sync_all_services_from_shm(force_log) | ||
| if not consul_dict then | ||
| return | ||
| end | ||
|
|
||
| local data = consul_dict:get(consul_dict_services_key) | ||
| if not data then | ||
| if force_log then | ||
| log.info("consul shared dict services empty") | ||
| end | ||
| return | ||
| end | ||
|
|
||
| local decoded, err = core.json.decode(data) | ||
| if not decoded then | ||
| log.error("failed to decode consul services from shared dict: ", err) | ||
| return | ||
| end | ||
|
|
||
| all_services = decoded | ||
| end |
Member
There was a problem hiding this comment.
Both functions writing to and reading from shared memory through all_services seem very dangerous. It may lead to re-writing the data just read from shared memory instead of getting the latest data from consul due to unexpected execution order. It is recommended to remove the use of the all_services variable, simplifying it so that only a privileged agent starts a timer to periodically fetch data from consul, while other workers only load data from shared memory.
|
LGTM, I have merged this PR into my internal release, and it's already resolved my problem |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Configuring Consul service discovery in APISIX and restarting it while APISIX is continuously receiving traffic will result in frequent 503 errors.
In the design of Consul service discovery, only worker 0 directly pulls nodes from Consul and updates data. Other workers rely on
events:registerto receive broadcasts. If, when APISIX restarts, any worker has not yet completedevents:register, but the service list data broadcast has already been sent, these workers will miss receiving data, and requests sent to that worker will return a 503 error.This problem can be avoided by changing the event module to shared dict.
Which issue(s) this PR fixes:
Fixes #12398
Checklist