Source
- RCA log: `harness/logs/code-coach/2026-05-10-07-51-cli-block-recurrence-rca.md`
- Knowledge (newly canonicalised): `harness/knowledge/code-coach/wm-copydata-ipc-pitfalls.md`
Verdict
| Severity |
Count |
Status |
| Must-fix |
0 |
— |
| Should-fix |
1 |
resolved by hotfix this commit (Option A) |
| Suggestion |
2 |
open (Option B, UIA STA gap) |
Findings
1. Should-fix — `SendMessage` (no-timeout) makes CLI infinitely block on busy GUI
File: `Project/AgentZeroWpf/NativeMethods.cs:384-385`, `Project/AgentZeroWpf/CliHandler.cs:99`
```csharp
[LibraryImport("user32.dll", EntryPoint = "SendMessageW")]
public static partial IntPtr SendMessageCopyData(...);
```
`SendMessage` is synchronous and unbounded — any stall on the WPF UI thread
(LLM model warm-up, ConPTY pipe blocking, modal dialog, COM marshal) leaves
the CLI process blocked forever. `_timeoutMs = 5000` only governs MMF
response polling, not the send itself. This is the recurring "CLI 블락 현상."
Rewrite (landed):
```csharp
// NativeMethods.cs
[LibraryImport("user32.dll", EntryPoint = "SendMessageTimeoutW")]
public static partial IntPtr SendMessageTimeoutCopyData(
IntPtr hWnd, uint Msg, IntPtr wParam, ref COPYDATASTRUCT lParam,
uint fuFlags, uint uTimeout, out IntPtr lpdwResult);
// CliHandler.SendWpfCommand
var rc = NativeMethods.SendMessageTimeoutCopyData(
agentWnd, NativeMethods.WM_COPYDATA, IntPtr.Zero, ref cds,
NativeMethods.SMTO_ABORTIFHUNG | NativeMethods.SMTO_NORMAL,
uTimeout: 3000, out _);
if (rc == IntPtr.Zero) { /* print error, return false */ }
```
Returning `bool` is load-bearing — every caller (`status`, `copy`,
`terminal-list`, `terminal-send`, `terminal-key`, `terminal-read`,
`bot-chat`) was updated to short-circuit.
2. Suggestion — Heavy work inside WndProc still risks per-handler timeouts (Option B)
File: `Project/AgentZeroWpf/UI/APP/MainWindow.xaml.cs:546` (`HandleTerminalSend`)
After Option A, a hung ConPTY child no longer locks the CLI infinitely — but
`SendMessageTimeout` will fire 3 s timeouts repeatably as long as
`session.WriteAndSubmit` runs synchronously in WndProc. Move handler bodies
to `Task.Run` and respond from a worker; CLI MMF poll already covers the
latency.
Apply per-handler. Start with `HandleTerminalSend` / `HandleTerminalKey`
(both write to the PTY); `HandleStatus` / `HandleTerminalList` are pure
in-process reads and don't justify the threading overhead.
3. Suggestion — `OsControlService.TextCapture` lacks STA marshal (latent)
File: `Project/AgentZeroWpf/OsControl/OsControlService.cs:158-181`
```csharp
public static string TextCapture(long hwnd, ...)
{
var result = ElementTreeScanner.Scan((IntPtr)hwnd, ...); // no STA pin
...
}
```
The sibling `ElementTreeAsync` does marshal onto an STA thread (and the
comment there explicitly states "System.Windows.Automation requires an STA
thread"). `TextCapture` works today only because callers happen to be on
STA threads (CLI main, WPF dispatcher). A future LLM-toolbelt path on an
MTA thread will deadlock.
Recommendation
- Apply now — Option A (this PR/commit). Eliminates the infinite-block
class entirely.
- Next mission — Option B: backgrounding for `HandleTerminalSend` and
`HandleTerminalKey` first.
- Adjacent mission — STA pin in `OsControlService.TextCapture`. Add a
reviewer-checklist line for "any new `System.Windows.Automation` call must
be STA-pinned."
Closes-when
Source
Verdict
Findings
1. Should-fix — `SendMessage` (no-timeout) makes CLI infinitely block on busy GUI
File: `Project/AgentZeroWpf/NativeMethods.cs:384-385`, `Project/AgentZeroWpf/CliHandler.cs:99`
```csharp
[LibraryImport("user32.dll", EntryPoint = "SendMessageW")]
public static partial IntPtr SendMessageCopyData(...);
```
`SendMessage` is synchronous and unbounded — any stall on the WPF UI thread
(LLM model warm-up, ConPTY pipe blocking, modal dialog, COM marshal) leaves
the CLI process blocked forever. `_timeoutMs = 5000` only governs MMF
response polling, not the send itself. This is the recurring "CLI 블락 현상."
Rewrite (landed):
```csharp
// NativeMethods.cs
[LibraryImport("user32.dll", EntryPoint = "SendMessageTimeoutW")]
public static partial IntPtr SendMessageTimeoutCopyData(
IntPtr hWnd, uint Msg, IntPtr wParam, ref COPYDATASTRUCT lParam,
uint fuFlags, uint uTimeout, out IntPtr lpdwResult);
// CliHandler.SendWpfCommand
var rc = NativeMethods.SendMessageTimeoutCopyData(
agentWnd, NativeMethods.WM_COPYDATA, IntPtr.Zero, ref cds,
NativeMethods.SMTO_ABORTIFHUNG | NativeMethods.SMTO_NORMAL,
uTimeout: 3000, out _);
if (rc == IntPtr.Zero) { /* print error, return false */ }
```
Returning `bool` is load-bearing — every caller (`status`, `copy`,
`terminal-list`, `terminal-send`, `terminal-key`, `terminal-read`,
`bot-chat`) was updated to short-circuit.
2. Suggestion — Heavy work inside WndProc still risks per-handler timeouts (Option B)
File: `Project/AgentZeroWpf/UI/APP/MainWindow.xaml.cs:546` (`HandleTerminalSend`)
After Option A, a hung ConPTY child no longer locks the CLI infinitely — but
`SendMessageTimeout` will fire 3 s timeouts repeatably as long as
`session.WriteAndSubmit` runs synchronously in WndProc. Move handler bodies
to `Task.Run` and respond from a worker; CLI MMF poll already covers the
latency.
Apply per-handler. Start with `HandleTerminalSend` / `HandleTerminalKey`
(both write to the PTY); `HandleStatus` / `HandleTerminalList` are pure
in-process reads and don't justify the threading overhead.
3. Suggestion — `OsControlService.TextCapture` lacks STA marshal (latent)
File: `Project/AgentZeroWpf/OsControl/OsControlService.cs:158-181`
```csharp
public static string TextCapture(long hwnd, ...)
{
var result = ElementTreeScanner.Scan((IntPtr)hwnd, ...); // no STA pin
...
}
```
The sibling `ElementTreeAsync` does marshal onto an STA thread (and the
comment there explicitly states "System.Windows.Automation requires an STA
thread"). `TextCapture` works today only because callers happen to be on
STA threads (CLI main, WPF dispatcher). A future LLM-toolbelt path on an
MTA thread will deadlock.
Recommendation
class entirely.
`HandleTerminalKey` first.
reviewer-checklist line for "any new `System.Windows.Automation` call must
be STA-pinned."
Closes-when
bounded timeout