Skip to content

Commit c8fc5fb

Browse files
joe4devclaude
andcommitted
fix(init): reset timed-out init synchronously and fail provisioning on extended-init timeout
Two fixes to the init-timeout case in the init-await switch: 1. The reset of a timed-out init now runs synchronously BEFORE signaling ready. Reset's cleanup (Clear/Release in rapidcore.Server.Reset) releases the current reservation, so running it concurrently with the first invoke's Reserve() raced that invoke's reservation and could cancel it mid-flight, returning an empty result while the suppressed init was still running. The reset cannot deadlock on the unconsumed init failure: awaitInitCompletion acks rapid before the (still pending) initFailures channel send, which the invoke path consumes later. 2. The suppressed-init retry model only applies to on-demand functions. Provisioned concurrency / Managed Instances environments that exceed their extended init window now fail provisioning via /status/error (AWS fails the provisioning operation) instead of signaling ready and inevitably re-running the long init into the shorter invoke timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 44fe9ff commit c8fc5fb

1 file changed

Lines changed: 29 additions & 11 deletions

File tree

cmd/localstack/main.go

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,14 @@ package main
55
import (
66
"context"
77
"errors"
8+
"fmt"
89
"os"
910
"runtime/debug"
1011
"strconv"
1112
"strings"
1213
"time"
1314

15+
"github.com/aws/aws-lambda-runtime-interface-emulator/internal/lambda/fatalerror"
1416
"github.com/aws/aws-lambda-runtime-interface-emulator/internal/lambda/interop"
1517
"github.com/aws/aws-lambda-runtime-interface-emulator/internal/lambda/rapidcore"
1618
log "github.com/sirupsen/logrus"
@@ -285,20 +287,36 @@ func main() {
285287
time.Duration(initPhaseTimeoutSeconds) * time.Second,
286288
)
287289
switch {
290+
case timedOut && !interopServer.onDemand:
291+
// Provisioned concurrency / Managed Instances: AWS fails the provisioning operation
292+
// when the extended init window is exceeded — there is no suppressed-init retry at
293+
// invoke time. Report the failure and exit instead of signaling ready.
294+
// TODO: validate the exact provisioning-failure errorType/message against AWS
295+
// (e.g. the Managed Instances API model's FUNCTION_ERROR_INIT_TIMEOUT).
296+
log.Errorf("Extended init phase timed out after %ds. Exiting.", initPhaseTimeoutSeconds)
297+
interopServer.SendInitError(
298+
fatalerror.SandboxTimeout,
299+
fmt.Errorf("Init phase timed out after %d seconds", initPhaseTimeoutSeconds),
300+
)
301+
return
288302
case timedOut:
289-
// AWS limits the init phase to 10s. When exceeded, init is retried at the time of the
290-
// first invocation under the function timeout ("suppressed init"). We report the init
291-
// timeout and signal ready so LocalStack dispatches the first invoke, then reset the
292-
// in-progress init so rapidcore re-runs a fresh Init phase when that invoke arrives.
293-
// The reset failure is intentionally left unconsumed here so the invoke path's
294-
// Reserve()/awaitInitialized() picks it up and triggers the suppressed init.
303+
// On-demand: AWS limits the init phase to 10s. When exceeded, init is retried at the
304+
// time of the first invocation under the function timeout ("suppressed init"). We
305+
// report the init timeout, reset the in-progress init so rapidcore re-runs a fresh
306+
// Init phase when the first invoke arrives, and only then signal ready.
307+
// The reset must complete BEFORE signaling ready: its cleanup (Clear/Release in
308+
// rapidcore.Server.Reset) releases the current reservation, so running it concurrently
309+
// with the first invoke's Reserve() would cancel that invoke's reservation mid-flight.
310+
// The reset cannot block on the unconsumed init failure: awaitInitCompletion acks rapid
311+
// before the (still pending) initFailures channel send, which the invoke path's
312+
// Reserve()/awaitInitialized() later consumes to trigger the suppressed init.
295313
log.Debugln("Init phase timed out; deferring to suppressed init on first invocation.")
296314
interopServer.ReportInitTimeout()
297-
go func() {
298-
if _, resetErr := interopServer.delegate.Reset("initTimeout", initResetTimeoutMs); resetErr != nil {
299-
log.Debugf("Reset after init timeout returned: %s", resetErr)
300-
}
301-
}()
315+
if _, resetErr := interopServer.delegate.Reset("initTimeout", initResetTimeoutMs); resetErr != nil {
316+
// A non-nil error only carries the aborted init's fatal error type; the reset
317+
// itself has completed and the suppressed-init retry stays valid.
318+
log.Debugf("Reset after init timeout returned: %s", resetErr)
319+
}
302320
case interopServer.onDemand && errors.Is(err, rapidcore.ErrInitDoneFailed):
303321
// On-demand: AWS folds a failed cold-start init into the first invocation (suppressed
304322
// init). Signal ready and keep the process alive so LocalStack dispatches the first

0 commit comments

Comments
 (0)