Skip to content

Fix AsyncioConnection race conditions causing EBADF errors (#614)#697

Open
dkropachev wants to merge 1 commit intomasterfrom
fix/asyncio-close-race-614
Open

Fix AsyncioConnection race conditions causing EBADF errors (#614)#697
dkropachev wants to merge 1 commit intomasterfrom
fix/asyncio-close-race-614

Conversation

@dkropachev
Copy link
Collaborator

@dkropachev dkropachev commented Feb 12, 2026

Summary

  • Fix close() to wait for _close() completion from non-event-loop threads, eliminating the window where is_closed=True but the socket fd is still open (root cause of EBADF)
  • Set last_error on server EOF in handle_read() so factory() detects dead connections instead of returning them
  • Add is_closed/is_defunct guard in push() to reject writes on closed connections
  • Treat BrokenPipeError/ConnectionResetError in handle_write() as clean peer disconnections instead of defuncting, and skip defunct() in both I/O handlers if the connection is already shutting down

Test plan

  • New unit tests in tests/unit/io/test_asyncio_race_614.py cover all four race conditions using real socket pairs
  • Full unit test suite passes with no regressions
  • Integration tests with TLS and node restart scenarios

@dkropachev dkropachev force-pushed the fix/asyncio-close-race-614 branch 3 times, most recently from 46524ac to 7b815c8 Compare February 16, 2026 13:09
@dkropachev dkropachev self-assigned this Feb 16, 2026
@dkropachev dkropachev marked this pull request as ready for review February 16, 2026 13:26
Comment on lines 181 to 184
if self.is_closed or self.is_defunct:
raise ConnectionShutdown(
"Connection to %s is already closed" % (self.endpoint,))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check that in push? push only puts data in the asyncio queue, it has nothing to do with actually putting that data in the socket, and thus no way to cause the error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was to narrow race down, removed.

Comment on lines 219 to 224
# Peer disconnected — do a clean close instead of defunct
if isinstance(err, (BrokenPipeError, ConnectionResetError)):
log.debug("Connection %s closed by peer during write: %s",
self, err)
self.close()
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how is close "cleaner" that defunct, and what is the difference between them in general? I see that defunct does call close internally, so its not obvious to me.

Fix race conditions in AsyncioConnection that cause "[Errno 9] Bad
file descriptor" errors during node restarts, especially with TLS:

1. close() now waits for _close() to complete when called from outside
   the event loop thread, eliminating the window where is_closed=True
   but the socket fd is still open.

2. handle_read() sets last_error on server EOF so factory() detects dead
   connections instead of returning them to callers.

3. handle_write() treats peer disconnections as clean close instead of
   defuncting, and both I/O handlers skip defunct() if the connection
   is already shutting down.

Peer-disconnect detection is extracted into _is_peer_disconnect() helper
covering platform-specific behaviors:

- Windows: ProactorEventLoop raises plain OSError with winerror 10054
  (WSAECONNRESET) or 10053 (WSAECONNABORTED) instead of
  ConnectionResetError. Detection uses ConnectionError base class plus
  winerror check.

- macOS: Raises OSError(57) ENOTCONN when writing to a
  peer-disconnected socket, which is not a ConnectionError subclass.
  Detection uses errno-based checks for ENOTCONN, ESHUTDOWN,
  ECONNRESET, and ECONNABORTED.

- Windows _close(): ProactorEventLoop does not support
  remove_reader/remove_writer (raises NotImplementedError). These calls
  are wrapped so the socket is always closed regardless, and try/finally
  ensures connected_event is always set even if cleanup fails.
@dkropachev dkropachev force-pushed the fix/asyncio-close-race-614 branch from 2b9d555 to 85d08e1 Compare February 17, 2026 04:52
if fd >= 0:
try:
self._loop.remove_writer(fd)
except NotImplementedError:
self.endpoint, exc_info=True)
try:
self._loop.remove_reader(fd)
except NotImplementedError:

try:
self._socket.close()
except OSError:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants