Resiliency matters, and yet we still underestimate how fragile the digital world is. A single API failure can cascade across industries: flights delayed, nurses locked out of medication charts, government services unavailable. Recent incidents include a Crowdstrike outage that caused widespread disruption, a Google outage in June 2025 triggered by a null pointer exception, Cloudflare incidents where a frontend retry loop overwhelmed tenant services, and a Tesla API outage that left owners unable to open their cars.
At its core, API resiliency testing is about ensuring services are predictable and durable under adverse conditions. It is not just checking happy paths, Specmatic verifies both degraded response and recovery. Resiliency testing spans a spectrum of approaches designed to expose weaknesses before they fail in production.
Learn how to test two resilience behaviors when a downstream dependency is not responsive:
- Load shedding pattern: return
429 Too Many Requestswhen product-search load should be shed - Async Create: return
202 Acceptedwhen product creation is accepted but not completed yet
Time to time downstream services might experience issue, however these need to be handled gracefully by your service and should not surface as generic timeouts or 500s.
In this lab, the contract is already the source of truth:
GET /findAvailableProductsshould return429withRetry-Afterwhen the downstream product API times out.POST /productsshould return202 Acceptedwith a monitor link when the downstream create call is taking time.
Your job is to test the service under test (BFF) matches those resilience expectations by simulating downstream delays through the downstream mock examples.
15-20 minutes.
- Docker is installed and running.
- You are in
labs/api-resiliency-testing. - Ports
8080,9000, and9001are available.
suiteis the Specmatic contract-test runner that starts dependency mocks and executes the tests.order-bffis the system under test on port8080.- the dependency mocks are generated from shared contracts in
labs-contracts(common/openapi/order-api/api_order_v5.yamlandcommon/asyncapi/product-audits/kafka.yaml) by thesuiteservice. - The BFF contract under test is pulled from
labs-contracts(openapi/order-bff-resiliency/product_search_bff_v6.yaml). - You will edit only downstream mock examples in
examples/order-service/.
.specmatic/repos/labs-contracts/openapi/order-bff-resiliency/product_search_bff_v6.yaml- BFF contract under test afterspecmatic.yamlchecks out the contracts repo..specmatic/repos/labs-contracts/common/openapi/order-api/api_order_v5.yaml- downstream product API contract used for mocking.examples/bff/test_products_too_many_requests.json- test expecting429.examples/bff/test_accepted_product_request.json- test expecting202.examples/order-service/stub_products_200.json- healthy downstream search stub.examples/order-service/stub_product_201.json- healthy downstream create stub.examples/order-service/stub_timeout_get_products.json- matching downstream search example that is missing the delay needed to trigger load shedding.examples/order-service/stub_timeout_post_product.json- matching downstream create example that is missing the delay needed to trigger202 Accepted.
- Run the baseline and observe two failures.
- Fix
examples/order-service/stub_timeout_get_products.jsonso the429test passes. - Re-run and confirm only the
202test is still failing. - Fix
examples/order-service/stub_timeout_post_product.jsonso the202test passes. - Enable
schemaResiliencyTests: alland observe the extra202failures. - Generalize
examples/order-service/stub_timeout_post_product.jsonwithvalue:eachmatchers. - Re-run and confirm the full suite passes.
- Do not edit the specs pulled from
labs-contractsin this lab. - Do not edit files under
examples/bff/. - Do not edit
docker-compose.yaml. - Edit only:
specmatic.yamlexamples/order-service/stub_timeout_get_products.jsonexamples/order-service/stub_timeout_post_product.json
429resilience demo: When dependencies timeout, does your API shed load with 429 responses?202resilience demo: When downstream services lag, does your API gracefully accept with 202 responses?- Contract testing docs: https://docs.specmatic.io/documentation/contract_tests.html
Run:
docker compose --profile test up --abort-on-container-exitExpected baseline result:
Tests run: 5, Successes: 3, Failures: 2, Errors: 0
The two failing scenarios should be:
GET /findAvailableProducts -> 429fromtest_products_too_many_requests.jsonPOST /products -> 202fromtest_accepted_product_request.json
Why they fail:
stub_timeout_get_products.jsonmatches the search request, but it does not delay the downstream response, so the BFF gets a normal200instead of shedding load with429.stub_timeout_post_product.jsonmatches the create request, but it does not delay the downstream response, so the BFF completes normally with201instead of returning202 Accepted.- In this lab, the delay must be
transientbecause Specmatic also verifies recovery: once the downstream service is responsive again, the response should go back to normal.
Clean up:
docker compose --profile test down -vOpen examples/order-service/stub_timeout_get_products.json.
It already matches the downstream search request that should trigger load shedding. Add:
"transient": true"delay-in-seconds": 2
Keep:
- header
pageSizeas20 - query
typeas exact valueotherwithtimes:2 - query
from-dateas2025-11-01 - query
to-dateas2025-11-15 - the
200 OKdownstream response body
Re-run:
docker compose --profile test up --abort-on-container-exitExpected checkpoint result:
Tests run: 5, Successes: 4, Failures: 1, Errors: 0
At this point:
- the
429scenario passes - the
202scenario still fails
Clean up:
docker compose --profile test down -vOpen examples/order-service/stub_timeout_post_product.json.
It already matches the delayed create-product request. Add:
"transient": true"delay-in-seconds": 2
Keep:
- request body
nameasUniqueName - request body
typeasbook - request body
inventoryas9 - the existing header matchers
- the downstream
201 Createdresponse
Re-run:
docker compose --profile test up --abort-on-container-exitExpected checkpoint result:
Tests run: 5, Successes: 5, Failures: 0, Errors: 0
At this point:
- the baseline
429and202resilience flow passes - the lab is still running with
schemaResiliencyTests: none
Clean up:
docker compose --profile test down -vOpen specmatic.yaml and change:
schemaResiliencyTests: noneto:
schemaResiliencyTests: allRe-run:
docker compose --profile test up --abort-on-container-exitWhat changes:
- the
429scenario continues to work - one
202scenario still passes - additional generated
POST /products -> 202requests now appear - those extra
202scenarios fail becausestub_timeout_post_product.jsonis hard-coded to only one request shape:name: UniqueNametype: bookinventory: 9
Expected failure direction:
- the suite now reports multiple
POST /products -> 202failures - one concrete
202example still passes - the new failures come from additional valid request variations generated from the contract
Expected Task C checkpoint result before the matcher fix:
Tests run: 249, Successes: 238, Failures: 11, Errors: 0
Why this is useful:
- this is closer to real-world resiliency testing
- Specmatic is not only checking one example anymore
- it is generating valid request variations and expecting the same graceful
202behavior across them
What is happening under the hood:
- your transient timeout example currently matches only one exact request:
name: UniqueNametype: bookinventory: 9
- with
schemaResiliencyTests: all, Specmatic generates more validPOST /productsrequests from the contract - those generated requests still satisfy the API contract, so the BFF is expected to handle them gracefully as well
- but the transient timeout example no longer matches when
typeorinventorychanges - when that timeout example does not match, Specmatic falls back to the normal downstream success example, so the BFF receives a fast
201 Created - because the downstream did not time out, the BFF returns
201instead of202, and the generated resiliency tests fail
This is also why the transient behavior matters:
- Specmatic is not only checking the degraded path
- it also checks recovery
- the timeout example should apply while the transient delay is active, and then the downstream should go back to normal responses afterward
- this makes the lab more realistic than testing a permanent failure
To fix this, update examples/order-service/stub_timeout_post_product.json so the request body becomes:
"body": {
"name": "UniqueName",
"type": "$match(dataType:ProductType, value:each, times:1)",
"inventory": "$match(dataType:ProductInventory, value:each, times:1)"
}Why value:each is the right matcher here:
value:eachtracks matcher exhaustion separately for each distinct value- that means each valid
typevalue gets its own one-time transient timeout match - each valid
inventoryvalue also gets its own one-time transient timeout match - this lets the same timeout example work for unique valid request variations generated by
schemaResiliencyTests: all - a hard-coded value like
type: bookonly works for one request shape, butvalue:eachscales the transient behavior across the generated valid inputs
Specmatic documentation for this matcher behavior:
Keep:
"transient": true"delay-in-seconds": 2- the existing header matchers
- the downstream
201 Createdresponse
Re-run again:
docker compose --profile test up --abort-on-container-exitExpected outcome:
- the additional generated
202scenarios now pass - the lab still verifies both degraded behavior and recovery after the transient timeout
Final expected result:
Tests run: 249, Successes: 249, Failures: 0, Errors: 0
Clean up:
docker compose --profile test down -vStart Studio:
docker compose --profile studio up --buildOpen http://127.0.0.1:9000/_specmatic/studio.
Then:
- Open
specmatic.yaml. - Click Run Suite.
- Observe the same two failures in the baseline state.
- Fix
stub_timeout_get_products.jsonand rerun. - Confirm only the
202scenario still fails. - Fix
stub_timeout_post_product.jsonand rerun. - Change
schemaResiliencyTeststoall. - Observe the extra generated
202failures. - Update
stub_timeout_post_product.jsonto usevalue:eachmatchers. - Rerun and confirm the full suite passes.
Stop Studio:
docker compose --profile studio down -v- If the baseline does not fail with exactly two failures, confirm you have not already fixed one of the timeout example files.
- If
429still does not appear after Task A, confirm you added both"transient": trueand"delay-in-seconds": 2, and thattimes:2is still present for theotherquery. - If
202still does not appear after Task B, confirm you added both"transient": trueand"delay-in-seconds": 2, and that the request body still usesUniqueName,book, and9. - If Task C still fails for
202, confirmtypeandinventoryare using the$match(dataType:..., value:each, times:1)form instead of fixed values. - If Docker reports port conflicts, stop the conflicting services and rerun the same command.
- Baseline run fails with exactly two failing scenarios: one
429, one202. - After Task A, only the
202scenario remains failing. - After Task B, the base resiliency flow passes with
schemaResiliencyTests: none. - After Task C, the full suite passes with
schemaResiliencyTests: all.
429is a contract-level resilience behavior, not just an implementation detail.202 Acceptedis useful when work is deferred because a downstream dependency is slow.- Specmatic examples can model downstream latency and verify both graceful degradation and recovery without changing application code.
If you are doing this lab as part of an eLearning course, return to the eLearning site and continue with the next module.