feat: Add error ratio-based circuit breaking policy to api-breaker plugin#12765
feat: Add error ratio-based circuit breaking policy to api-breaker plugin#12765HaoTien wants to merge 14 commits intoapache:masterfrom
Conversation
…ugin - Add new 'unhealthy-ratio' policy that triggers circuit breaker based on error rate within sliding time window - Implement three-state circuit breaker: CLOSED -> OPEN -> HALF_OPEN -> CLOSED - Add configurable parameters: error_ratio, min_request_threshold, sliding_window_size, permitted_number_of_calls_in_half_open_state, success_ratio - Maintain full backward compatibility with existing 'unhealthy-count' policy as default - Add comprehensive test coverage for new functionality - Update documentation in both Chinese and English - Follow APISIX coding standards and testing conventions This enhancement provides more intelligent circuit breaking for microservices architectures by considering error rates rather than just consecutive failure counts.
Baoyuantop
left a comment
There was a problem hiding this comment.
Thanks for your contribution! Based on the current configuration, we need to add some test cases:
-
After the sliding window time (sliding_window_size) expires, are the statistics (total number of requests, number of failures) correctly cleared?
-
Failure fallback in half-open state (Half-Open -> Open)
-
Sending more requests than permitted_number_of_calls_in_half_open_state in half-open state
…open_state; insert some test cases
|
Hi @HaoTien, please fix the lint error |
|
|
||
| function _M.api_breaker() | ||
| ngx.exit(tonumber(ngx.var.arg_code)) | ||
| local code = tonumber(ngx.var.arg_code) or 200 |
There was a problem hiding this comment.
Replace ngx.say with ngx.print. The reason is that the test cases strictly match the content of the response body and do not expect a newline character at the end. ngx.say will automatically add line breaks, while ngx.print will not
|
The current merge check error has nothing to do with the code I submitted |
|
| if total_requests >= minimum_calls then | ||
| local failure_rate = unhealthy_count / total_requests | ||
| -- Use precise comparison to avoid floating point issues | ||
| local rounded_failure_rate = math.floor(failure_rate * 10000 + 0.5) / 10000 |
There was a problem hiding this comment.
Why choose 4 decimal places? This should be explained in the comments.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --- error_code: 400 | ||
| --- response_body | ||
| {"error_msg":"failed to check the configuration of plugin api-breaker err: property \"healthy\" validation failed: property \"http_statuses\" validation failed: expected unique items but items 1 and 2 are equal"} | ||
| {"error_msg":"failed to check the configuration of plugin api-breaker err: then clause did not match"} |
There was a problem hiding this comment.
This test now asserts a generic schema failure message (then clause did not match). That message is an artifact of the conditional schema and is much less specific than the previous unique-items validation error, making the test less effective at catching regressions. Prefer asserting the actual validation cause (e.g., via response_body_like matching expected unique items), or otherwise adjust the schema so duplicate healthy.http_statuses produces a stable/clear error message.
|
Please help review @membphis |
|
The current merge check error has nothing to do with the code I submitted @Baoyuantop |
moonming
left a comment
There was a problem hiding this comment.
Hi @HaoTien, thank you for the error ratio-based circuit breaking! This has had 21 reviews, showing strong community engagement.
Error ratio-based breaking (e.g., trip when >50% of requests fail in a window) is indeed smarter than the current consecutive-failures approach.
Since this is in the awaiting review state with extensive review history, could you:
- Confirm all review comments have been addressed
- Provide a brief summary of the final design decisions made during the 21 reviews
- Ensure the documentation clearly explains the new
error_ratiopolicy alongside the existingconsecutive_errorspolicy
This looks close to ready. Let me do a deeper code review once you confirm the above. Thank you for persisting through the extensive review process! 👏
|
Hi @moonming , thank you for your review! Let me address your questions:
✅ Added test cases for sliding window expiration statistics reset (TEST 17) Circuit Breaker States: CLOSED → OPEN: Triggered when error rate exceeds error_ratio threshold with minimum min_request_threshold requests Uses sliding window (sliding_window_size) for statistics collection, which resets after expiration The unhealthy-count policy (default) triggers based on consecutive failure counts Configuration attributes table |
|
@moonming Please help review |
feat: Add error ratio-based circuit breaking policy to api-breaker plugin
What this PR does / why we need it
This PR implements error ratio-based circuit breaking (
unhealthy-ratiopolicy) for theapi-breakerplugin, providing more intelligent and adaptive circuit breaking behavior based on error rates within a sliding time window, rather than just consecutive failure counts.Closes #12763
Types of changes
Description
Current Limitations
New Features Added
unhealthy-ratiopolicy that triggers circuit breaker based on error rate within a sliding time windowNew Configuration Parameters
policy"unhealthy-count"unhealthy.error_ratio0.5unhealthy.min_request_threshold10unhealthy.sliding_window_size300unhealthy.permitted_number_of_calls_in_half_open_state3healthy.success_ratio0.6Example Configuration
{ "plugins": { "api-breaker": { "break_response_code": 503, "policy": "unhealthy-ratio", "max_breaker_sec": 60, "unhealthy": { "http_statuses": [500, 502, 503, 504], "error_ratio": 0.5, "min_request_threshold": 10, "sliding_window_size": 300, "permitted_number_of_calls_in_half_open_state": 3 }, "healthy": { "http_statuses": [200, 201, 202], "success_ratio": 0.6 } } } }How Has This Been Tested?
Test Results
Files Modified
apisix/plugins/api-breaker.lua- Core plugin logic with new ratio-based policyt/plugin/api-breaker2.t- New comprehensive test file for ratio-based circuit breakingdocs/en/latest/plugins/api-breaker.md- Updated English documentationdocs/zh/latest/plugins/api-breaker.md- Updated Chinese documentationChecklist
Additional Notes
This implementation:
The feature addresses real-world use cases for:
Ready for review and feedback!