Fix bg fill teardown crash in update_size_and_time_stats by bneradt · Pull Request #13114 · apache/trafficserver

bneradt · 2026-04-21T23:22:08Z

This fixes a crash seen on docs.

Production crash logs show ink_assert(0) firing in
HttpTransact::update_size_and_time_stats from HttpSM::update_stats via
HttpSM::kill_this, triggered by an Http2Stream event re-entering the
state machine while background_fill was still in the STARTED state.
The background_fill state is normally driven to a terminal value
(COMPLETED or ABORTED) by tunnel_handler_server, but when the SM is
torn down before that handler runs the state stays STARTED and the
unreachable default branch in update_size_and_time_stats asserts. The
same path also leaks proxy.process.http.background_fill_current_count
because tunnel_handler_server is the only site that decrements the
gauge after tunnel_handler_ua incremented it.

This normalizes STARTED to ABORTED inside HttpSM::kill_this, before
the enable_http_stats gate that guards update_stats, and decrements
the gauge there. Doing the normalization in kill_this rather than in
update_stats keeps the gauge accounting balanced regardless of whether
http stats are enabled, and ensures the downstream stats helper sees a
terminal state. As a defensive backstop this also folds the STARTED
case into the ABORTED branch of update_size_and_time_stats so a future
caller that reaches this point mid-fill records the bytes against
background_fill_bytes_aborted instead of crashing the server.

Copilot

Pull request overview

This PR addresses a production crash in the HTTP transaction teardown path by ensuring background-fill accounting is left in a terminal state (and no longer triggers an ink_assert(0)) when the state machine is destroyed mid-fill.

Changes:

Treat BackgroundFill_t::STARTED as an ABORTED-equivalent in HttpTransact::update_size_and_time_stats() to avoid asserting and to record aborted bytes instead.
Normalize HttpSM::background_fill from STARTED to ABORTED during HttpSM::update_stats() teardown, and decrement proxy.process.http.background_fill_current_count there to prevent gauge leakage on early teardown.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/proxy/http/HttpTransact.cc`	Makes stats update resilient to `STARTED` by folding it into the aborted accounting path instead of asserting.
`src/proxy/http/HttpSM.cc`	Normalizes leftover `STARTED` background-fill state during teardown and decrements the active-fill gauge to keep metrics balanced.

bneradt · 2026-05-06T21:11:05Z

Just to follow up: this has been stable on docs for the last few weeks, so it addresses the crash.

But @masaori335 reached out offline and suggested that it would be better to maybe keep the HttpSM open for the bg fill. That is a good point and something we should look into.

Production crash logs show ink_assert(0) firing in HttpTransact::update_size_and_time_stats when an HTTP/2 stream close re-enters HttpSM after the user-agent side was detached for background fill. The server-to-cache tunnel should continue, but the stale user-agent event misses the VC table and falls through to the default tunnel handler, which tears down the SM with background_fill still STARTED. This ignores stale user-agent VIO and close events while a background fill tunnel is active, so tunnel_handler_server remains responsible for driving the fill to COMPLETED or ABORTED. This also keeps an unconditional kill_this cleanup as a last-resort guard to balance background_fill_current_count if any unexpected path still tears down mid-fill.

bneradt · 2026-05-11T18:28:44Z

Just to follow up: this has been stable on docs for the last few weeks, so it addresses the crash.

But @masaori335 reached out offline and suggested that it would be better to maybe keep the HttpSM open for the bg fill. That is a good point and something we should look into.

Hi @masaori335 : when you have a moment, can you check the patch again? I have this latest patch running on docs now and it has been stable since this morning.

bneradt added this to the 11.0.0 milestone Apr 21, 2026

bneradt requested a review from Copilot April 21, 2026 23:22

bneradt self-assigned this Apr 21, 2026

bneradt added Cache Crash Bug labels Apr 21, 2026

bneradt added this to ATS v10.2.x Apr 21, 2026

Copilot started reviewing on behalf of bneradt April 21, 2026 23:22 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/proxy/http/HttpSM.cc Outdated

bneradt force-pushed the fix_update_size_and_time_stats branch from e700ff2 to 82a1557 Compare April 21, 2026 23:43

masaori335 self-requested a review April 23, 2026 22:32

bneradt force-pushed the fix_update_size_and_time_stats branch from 82a1557 to 129aa81 Compare May 11, 2026 13:56

bneradt force-pushed the fix_update_size_and_time_stats branch from 129aa81 to 15d0fbb Compare May 11, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bg fill teardown crash in update_size_and_time_stats#13114

Fix bg fill teardown crash in update_size_and_time_stats#13114
bneradt wants to merge 1 commit into
apache:masterfrom
bneradt:fix_update_size_and_time_stats

bneradt commented Apr 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

bneradt commented May 6, 2026

Uh oh!

bneradt commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bneradt commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

bneradt commented May 6, 2026

Uh oh!

bneradt commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bneradt commented Apr 21, 2026 •

edited

Loading