Cancel pending futures and exit listener on EOF / unexpected exceptions by rhalutz · Pull Request #44 · OnFreund/pyrisco

rhalutz · 2026-05-24T06:35:28Z

Summary

The _listen loop in RiscoSocket never exits when the panel drops the TCP
session gracefully, and pending send_command() calls wait the full 10s
asyncio.wait_for timeout instead of failing fast.

The only except branch that breaks is ConnectionResetError (RST). A
graceful EOF from the panel raises asyncio.IncompleteReadError, which hits
the bare except Exception: branch — that branch doesn't break the loop and
doesn't cancel any pending futures, so:

The listener keeps running on a dead socket.
The in-flight CLOCK keep-alive waits the full 10s timeout, then raises
OperationError('Timeout in command: CLOCK').
_keep_alive sleeps 5s and tries again — the cycle repeats forever.

Downstream effect in Home Assistant: a 15s Error in Risco library / Timeout in command: CLOCK log storm, and the integration never reconnects (see home-assistant/core#169279, home-assistant/core#167628, home-assistant/core#154062, and #36 in this repo).

Changes

Widen the fatal-error except to (ConnectionResetError, asyncio.IncompleteReadError, EOFError, OSError) so a graceful FIN exits the loop the same way an RST does.
Cancel all pending futures with OperationError('Connection lost') so send_command() callers fail in milliseconds, not after 10s.
break on unexpected exceptions too, so a bug in _handle_incoming can't silently keep the listener spinning on a half-broken socket.

Testing

Added tests/test_risco_socket.py (first tests for the local socket) covering: graceful EOF, RST, OS error, and an unexpected exception. All verify the loop exits and that any pending future is resolved with OperationError. Each test wraps _listen in asyncio.wait_for(..., 1) so a regression to the spinning-loop bug fails fast in CI instead of hanging — without this patch all four tests hang.

Verified end-to-end against a real Risco panel (pyrisco 0.6.8 + HA 2026.4 container): after the panel times out the session, the integration sees the connection drop immediately, HA reloads the entry, and the keep-alive resumes — instead of the previous indefinite CLOCK-timeout loop.

Fixes Had this error this morning where my risco local was failing #36 (root cause).
Overlaps with Handle the out of Memory error by counting subsequent timeouts - keep alive - Implement ConnectionError #22, which tries to detect the same disconnect by counting consecutive timeouts. This PR addresses the underlying cause (the listener never exits on EOF) and avoids the timeouts in the first place.
With this released, a follow-up in home-assistant/core to broaden the _error reload check beyond ConnectionResetError would let the integration self-heal cleanly — happy to file that separately.

OnFreund · 2026-05-26T22:54:00Z

@rhalutz Can you rebase on top of master? It should allow us to create a test build

Copilot

Pull request overview

This PR fixes a disconnect-handling edge case in the local TCP socket transport so the listener exits cleanly on EOF / socket-level failures and in-flight send_command() calls fail fast instead of waiting for the full timeout.

Changes:

Treat graceful EOF and other socket read failures as fatal in RiscoSocket._listen() and exit the listen loop.
Fail any pending command futures immediately when the connection is lost.
Add unit tests to ensure the listen loop terminates and errors are surfaced via the socket queue.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`pyrisco/local/risco_socket.py`	Expands disconnect exception handling and attempts to fail pending command futures when the connection drops or the listener errors.
`tests/test_risco_socket.py`	Adds regression tests covering EOF/RST/OSError/unexpected exceptions ensuring `_listen()` exits and pending futures are resolved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rhalutz · 2026-05-28T06:40:23Z

+        for i, fut in enumerate(self._futures):
+          if fut is not None and not fut.done():
+            fut.set_exception(OperationError('Connection lost'))
+            self._futures[i] = None


Done — the slot is now cleared for every entry regardless of state, and set_exception is only called on the still-pending ones:

for i, fut in enumerate(self._futures): if fut is not None and not fut.done(): fut.set_exception(OperationError('Connection lost')) self._futures[i] = None

rhalutz · 2026-05-28T06:40:34Z

      except Exception as error:
        await self._queue.put(error)
+        break


Addressed by changing the approach: the generic except Exception branch no longer breaks the listener (I'd added that break in the first revision, but it was too aggressive). Recoverable, per-command errors — the OperationError deliberately raised on a panel error frame at line 68, CRC mismatches, a late-response InvalidStateError — are surfaced on the queue but must not tear down the connection, which matches the original behaviour.

So there's no longer a "listener has exited but futures are still pending" state for non-connection errors: the listener keeps running. The only paths that stop the listener are real connection loss (ConnectionError / IncompleteReadError), and those now fail-and-clear all pending futures before breaking. If a write inside _handle_incoming fails on a dead socket it raises a ConnectionError, which lands in that same branch.

OnFreund · 2026-05-26T23:22:14Z

        else:
          await self._handle_incoming(cmd_id, command, crc)
-      except ConnectionResetError as error:
+      except (ConnectionResetError, asyncio.IncompleteReadError, EOFError, OSError) as error:


I think OSError is a bit too general. Can we have something more specific?

Good point — narrowed it to (ConnectionError, asyncio.IncompleteReadError):

ConnectionError is the stdlib base for the connection-death family (ConnectionResetError/ECONNRESET, ConnectionAbortedError/ECONNABORTED, ConnectionRefusedError/ECONNREFUSED, BrokenPipeError/EPIPE) — all subclasses of OSError, but only the ones that mean "the socket is gone".

asyncio.IncompleteReadError is the concrete type StreamReader.readuntil raises on a graceful FIN (and it's itself a subclass of EOFError, so the previous EOFError in the tuple was redundant and is dropped).

I also reverted the break I'd added to the generic except Exception branch — see the reply on Copilot's comment below for the reasoning.

The _listen loop never exits when the panel drops the TCP session gracefully, and pending send_command() calls wait the full 10s asyncio.wait_for timeout instead of failing fast. The only except branch that breaks is ConnectionResetError (RST). A graceful EOF from the panel raises asyncio.IncompleteReadError, which falls through to the generic handler that neither breaks the loop nor cancels pending futures, so: 1. The listener keeps running on a dead socket. 2. The in-flight CLOCK keep-alive waits the full 10s timeout, then raises OperationError('Timeout in command: CLOCK'). 3. _keep_alive sleeps 5s and tries again -- the cycle repeats forever. Downstream effect in Home Assistant: a 15s "Error in Risco library / Timeout in command: CLOCK" log storm, and the integration never reconnects. Catch the connection-loss family (ConnectionError, which covers ECONNRESET / ECONNABORTED / ECONNREFUSED / EPIPE) together with asyncio.IncompleteReadError (the graceful-EOF type readuntil raises), cancel all pending futures with OperationError('Connection lost') so send_command() callers fail in milliseconds instead of after 10s, and break out of the loop. Recoverable per-command errors continue to be surfaced on the queue without tearing down the listener, preserving the existing behaviour for panel error frames and CRC mismatches.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

pyrisco/local/risco_socket.py:89

PR description says the listener should exit on “unexpected exceptions too”, but the generic except Exception branch currently only queues the error and continues the loop. Either the description should be updated to match the implemented behavior (treating these as recoverable), or this handler should break/tear down consistently on truly-fatal unexpected exceptions.

      except Exception as error:
        await self._queue.put(error)

+          self._futures[i] = None
        await self._queue.put(error)
        break


rhalutz mentioned this pull request May 24, 2026

Handle the out of Memory error by counting subsequent timeouts - keep alive - Implement ConnectionError #22

Open

OnFreund requested a review from Copilot May 26, 2026 22:54

Copilot started reviewing on behalf of OnFreund May 26, 2026 22:54 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

OnFreund reviewed May 26, 2026

View reviewed changes

rhalutz force-pushed the fix/listen-loop-exit-on-eof branch from d66ed10 to 4a35f22 Compare May 28, 2026 06:38

rhalutz force-pushed the fix/listen-loop-exit-on-eof branch from 4a35f22 to 1edd177 Compare May 28, 2026 06:40

OnFreund requested a review from Copilot May 28, 2026 17:14

Copilot started reviewing on behalf of OnFreund May 28, 2026 17:15 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

Comment thread pyrisco/local/risco_socket.py

Comment on lines +84 to 86

self._futures[i] = None

await self._queue.put(error)

break

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancel pending futures and exit listener on EOF / unexpected exceptions#44

Cancel pending futures and exit listener on EOF / unexpected exceptions#44
rhalutz wants to merge 1 commit into
OnFreund:masterfrom
rhalutz:fix/listen-loop-exit-on-eof

rhalutz commented May 24, 2026

Uh oh!

OnFreund commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

rhalutz May 28, 2026

Uh oh!

rhalutz May 28, 2026

Uh oh!

OnFreund May 26, 2026

Uh oh!

rhalutz May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rhalutz commented May 24, 2026

Summary

Changes

Testing

Related

Uh oh!

OnFreund commented May 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

rhalutz May 28, 2026

Choose a reason for hiding this comment

Uh oh!

rhalutz May 28, 2026

Choose a reason for hiding this comment

Uh oh!

OnFreund May 26, 2026

Choose a reason for hiding this comment

Uh oh!

rhalutz May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants