Skip to content

Phase 2: Priority-based throttling, mutex guard, admin UI, and WP-CLI #19

@mrtwebdesign

Description

@mrtwebdesign

Context

Issue #3 defined a full implementation framework for priority-based concurrency throttling. PR #4 delivered the load detection and global throttle foundation as "Phase 1." This issue covers the remaining work needed to complete that vision.

What Phase 1 delivered (PR #4)

  • Load probing (Threads_running + AS queue depth) with normal/elevated/critical classification
  • Hysteresis with separate enter/exit thresholds and minimum dwell time
  • Global AS queue-runner throttling (batch size, time limit, concurrent batches)
  • Four-stage rollout (offtest_observeobserveenforce)
  • Cross-request state persistence (object cache → APCu → option fallback)
  • Structured logging for all throttle events and load transitions

What Phase 1 cannot do

Distinguish between action types. Under load, all AS work slows equally — a NoFraud payment webhook gets the same treatment as a Facebook catalog sync. The May 1 incident needed the opposite: keep payments flowing, pause deferrable syncs.


Progressive implementation plan

Each wave can be deployed independently and provides value on its own. Later waves build on earlier ones but each is a shippable increment.

Wave A — Refactor & registry (no behavior change, safe to deploy anytime)

Structural prep work. No runtime behavior changes, so this can ship without a rollout plan.

Wave B — Per-action throttling (core feature, needs rollout plan)

The primary deliverable. Requires Wave A. Should follow the same observeenforce rollout pattern as Phase 1.

Wave C — Thundering herd prevention (independent, can ship in any order)

Completely standalone — no dependency on Waves A or B. Addresses a different failure mode (N concurrent requests triggering the same expensive operation).

Wave D — Operator tooling (additive, ship as components land)

Each item here can ship as soon as the component it manages exists. No need to wait for everything.

  • WP-CLI — throttle commandswp queryguard throttle status/pause/resume/history. Can ship after Wave B.
  • WP-CLI — mutex commandswp queryguard mutex list/release. Can ship after Wave C.
  • Admin Settings page — UI for enable/disable toggles, load thresholds, priority registry editor, reschedule delays, current load status indicator, and last 20 throttled actions. Can ship incrementally as sections become relevant.

Component details

Priority Registry (Wave A)

Map Action Scheduler hook names to priority tiers. Stored as a WordPress option, editable in admin (Wave D), with a filter for code-based overrides.

Default registry:

Tier Hook patterns Behavior under load
Critical nofraud_*, woocommerce_payment_*, wc_payment_* Always runs
High woocommerce_scheduled_subscription_*, wcs_*, woocommerce_deliver_webhook_* Delayed 5-10 min during critical load
Normal woocommerce_run_*, action_scheduler_* Deferred +15 min during critical, +5 min during elevated
Deferrable facebook_for_woocommerce_*, wc_facebook_*, shipstation_*, klaviyo_*, woocommerce_flush_* Paused until load normalizes

Public API: get_priority( $hook_name ) returns tier string. Supports prefix/wildcard matching. Unregistered hooks default to normal.

Throttle Engine — per-action decisions (Wave B)

Hook action_scheduler_before_execute to apply the priority × load decision matrix:

Load Level Critical High Normal Deferrable
Normal Run Run Run Run
Elevated Run Run Defer +5 min Defer +15 min
Critical Run Defer +5 min Defer +15 min Defer +60 min

Deferred actions are rescheduled, never deleted. After load returns to normal, a cooldown ramp gradually releases deferred actions over 2-3 cycles to prevent a burst.

This builds on Phase 1's existing load detection and batch-size reduction — Phase 1 slows the whole queue, Phase 2 adds selective filtering within it.

Mutex Guard (Wave C)

Prevents thundering herd patterns (e.g., 150 concurrent NoFraud admin_init scans).

API:

  • acquire_lock( $operation_key, $ttl = 60 ) — returns true if acquired, false if held
  • release_lock( $operation_key ) — early release
  • is_locked( $operation_key ) — check without acquiring
  • force_release( $operation_key ) — admin/CLI recovery

Storage: wp_options with autoload = no (not transients — transients may use object cache that isn't shared across PHP processes on some hosts). Lock expired when current_time > stored_timestamp + TTL.

Admin Settings (Wave D)

A settings page (or tab within an existing Hypercart admin page) with:

  • Enable/disable toggles for throttle system and mutex guard
  • Load threshold inputs (threads_running and queue depth for elevated/critical)
  • Priority registry editor (hook patterns per tier)
  • Reschedule delay configuration per load level
  • Current load status indicator (normal/elevated/critical)
  • Last 20 throttled actions with timestamps

WP-CLI Commands (Wave D)

wp queryguard throttle status     # Current load level, batch size, active deferrals
wp queryguard throttle pause      # Force critical mode (maintenance window)
wp queryguard throttle resume     # Return to auto-detection
wp queryguard throttle history    # Last 50 throttle events
wp queryguard mutex list          # Show active locks
wp queryguard mutex release <key> # Force-release a stuck lock

Deployment summary

Wave What ships Value on its own Depends on
A Load monitor extraction + priority registry Clean architecture, registry available for filters — no behavior change Nothing (safe anytime)
B Per-action throttle + per-action logging Selective throttling: payments keep flowing, deferrable syncs pause under load Wave A
C Mutex guard Prevents thundering herd (N concurrent expensive operations) Nothing (independent)
D Admin UI + WP-CLI Operator visibility and control without code deploys Respective waves

Success criteria

From the original spec in #3:

  • During May 1-like conditions: deferrable syncs auto-pause, critical payment processing continues, subscription renewals delay but complete
  • Zero false positives during normal operation
  • Admin visibility into throttle decisions in real-time
  • No action is ever lost — only rescheduled

Dependencies

  • #12 — Extract throttle engine into separate classes (covered by Wave A)
  • Phase 1 runtime validation via test_observe / observe on staging should complete first to confirm load detection baselines

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions