FlakeIdGeneratorApiTest.testSmoke crashes with std::overflow_error on Windows (PR #1448) [API-2380]

## Problem

`FlakeIdGeneratorApiTest.testSmoke` fails randomly on the `windows-64-(Release, Static, noSSL)` CI configuration when running PR #1448 . The failure manifests as a process crash rather than a test assertion failure.

## CI Error

```
Exception: E06D7363.?AVoverflow_error@std@@
```

`E06D7363` is the Windows SEH exception code for any C++ exception thrown via MSVC's `__CxxThrowException`. The RTTI tag `?AVoverflow_error@std@@` is MSVC's mangled name for `std::overflow_error`. This indicates an unhandled `std::overflow_error` reached Windows' unhandled-exception filter and terminated the process.

The error was observed **repeatedly** across multiple test invocations on the same CI run, indicating a reproducible race condition rather than a one-off fluke.

## Affected Configuration

- Platform: Windows x64, Release, Static, no OpenSSL  
- CI runner: GitHub Actions (`windows-64-(Release, Static, noSSL)`)  
- Branch: `ihsandemir/update_test_server_version` (PR #1448) — not reproducible on `master`  
- Test: `FlakeIdGeneratorApiTest.testSmoke`

## Root Cause Analysis

### The only source of `std::overflow_error` in the codebase

`std::overflow_error` is thrown in exactly one place:

**`hazelcast/src/hazelcast/client/proxy.cpp`** — `new_id_internal()`:

```cpp
int64_t
flake_id_generator_impl::new_id_internal()
{
    auto b = block_.load();
    if (b) {
        int64_t res = b->next();
        if (res != INT64_MIN) {
            return res;
        }
    }
    throw std::overflow_error("");   // <-- sole source of the crash
}
```

This exception is used as a **control-flow signal** (not an error): "the local prefetch batch is exhausted; fetch a new one from the server." It is intended to be caught immediately in the calling function `new_id()`:

```cpp
boost::future<int64_t>
flake_id_generator_impl::new_id()
{
    try {
        return boost::make_ready_future(new_id_internal());
    } catch (std::overflow_error&) {          // <-- expected catch site
        return new_id_batch(batch_size_)
          .then(boost::launch::sync, ...);    // async chain begins here
    }
}
```

### Why the exception escapes on Windows/MSVC

The batch-fetch future chain is:

```
invocation_promise_  .then(user_executor,    id_seq_lambda)  → F1
F1                   .then(launch::sync,     decode_lambda)  → F2
F2                   .then(launch::sync,     block_callback) → F3
```

`complete_call_id_sequence()` in `spi.cpp` checks `user_executor.closed()` at call time. If the user executor is closed (race during client shutdown) it substitutes `boost::launch::sync`, making the **entire chain synchronous in whatever thread fires the parent promise** — potentially an IO thread.

The IO threads introduced by PR #1448 are started without any exception guard:

```cpp
// network.cpp:144
io_threads_.emplace_back([raw_ctx]() { raw_ctx->run(); });  // no try/catch
```

On Windows/MSVC with Release-build optimizations, Boost.Thread's `boost::launch::sync` continuation mechanism does not reliably contain exceptions within a continuation's promise when the underlying exception machinery is SEH-based. An `std::overflow_error` active in a parent continuation context can leak through the chain into the IO thread or user executor thread, neither of which has a handler, causing the process to crash.

The developer already observed `overflow_error` escaping from `invocation_promise_.set_exception()` and added broader catch clauses in `ClientInvocation::set_exception()` (commits `809bb5dc9`, `09ec6a5a8`), but crashes persist because additional escape paths exist through the `raw_ctx->run()` loop.

### Why this is a regression vs. master

`master` uses a single IO thread model; PR #1448 introduces multiple IO threads and the associated race with user-executor shutdown. The interaction between the new `launch::sync`-fallback path and the unguarded `raw_ctx->run()` loop is what allows the exception to terminate a thread.

### Frequency

The test generates ~4,000 batch-exhaustion events per run (400,000 IDs ÷ default prefetch batch of 100). Each event triggers the throw. With many throws per run, even a low-probability escape path becomes a near-certainty.

## Proposed Fix

Two complementary changes:

**1. Eliminate `std::overflow_error` as control flow (primary fix)**

Replace the throw/catch pattern in `new_id_internal()` / `new_id()` with the sentinel return value `INT64_MIN`, identical to the pattern already used by `Block::next()`. This removes the exception from the codebase entirely, making the crash structurally impossible.

**2. Guard IO thread loops against uncaught exceptions (defensive fix)**

Wrap `raw_ctx->run()` in a try-catch so that any future misbehaving handler cannot silently crash an IO thread.

A detailed design document and fix are provided in the associated PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlakeIdGeneratorApiTest.testSmoke crashes with std::overflow_error on Windows (PR #1448) [API-2380] #1450

Problem

CI Error

Affected Configuration

Root Cause Analysis

The only source of `std::overflow_error` in the codebase

Why the exception escapes on Windows/MSVC

Why this is a regression vs. master

Frequency

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FlakeIdGeneratorApiTest.testSmoke crashes with std::overflow_error on Windows (PR #1448) [API-2380] #1450

Description

Problem

CI Error

Affected Configuration

Root Cause Analysis

The only source of std::overflow_error in the codebase

Why the exception escapes on Windows/MSVC

Why this is a regression vs. master

Frequency

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The only source of `std::overflow_error` in the codebase