Skip to content

fix: close sockets on bind failure, fix __exit__ traceback in trainers#1032

Open
mango766 wants to merge 1 commit intoinclusionAI:mainfrom
mango766:fix/socket-leak-and-exit-protocol
Open

fix: close sockets on bind failure, fix __exit__ traceback in trainers#1032
mango766 wants to merge 1 commit intoinclusionAI:mainfrom
mango766:fix/socket-leak-and-exit-protocol

Conversation

@mango766
Copy link

Fixes #1031

Two small fixes:

1. Socket leak in is_port_free()

When sock.bind() fails, the socket was never closed. Switched to try/finally so it's always cleaned up.

2. Broken traceback in trainer __exit__

RLTrainer and SFTTrainer both had raise exc_value in __exit__, which replaces the original traceback with one pointing at the raise statement. Returning False lets Python re-raise naturally with the full traceback.

- is_port_free(): use try/finally so sockets are closed even when
  bind() raises OSError
- RLTrainer/SFTTrainer __exit__: return False instead of raise exc_value
  to preserve the original traceback
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses two distinct issues to enhance the robustness and debuggability of the system. It resolves a socket leak by guaranteeing socket closure in network utility functions and improves error reporting in trainer classes by ensuring that original exception tracebacks are preserved during exit handling.

Highlights

  • Socket Resource Management: Ensured that sockets are properly closed in is_port_free() even when sock.bind() fails, preventing potential resource leaks.
  • Improved Error Tracebacks: Corrected the __exit__ methods in RLTrainer and SFTTrainer to allow Python to naturally re-raise exceptions, preserving the full traceback instead of replacing it.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • areal/trainer/rl_trainer.py
    • Modified the __exit__ method to return False when an exception occurs, allowing the exception to propagate naturally with its original traceback.
  • areal/trainer/sft_trainer.py
    • Modified the __exit__ method to return False when an exception occurs, allowing the exception to propagate naturally with its original traceback.
  • areal/utils/network.py
    • Implemented try/finally blocks in the is_port_free function for both TCP and UDP socket checks to ensure that sockets are always closed, regardless of whether sock.bind() succeeds or fails.
Activity
  • No human activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two important fixes. First, it correctly handles socket cleanup in is_port_free by using try...finally blocks, preventing resource leaks on bind failures. Second, it improves exception handling in the __exit__ methods of RLTrainer and SFTTrainer by returning False instead of re-raising the exception, which preserves the original traceback for better debugging. The changes are correct and improve the robustness of the code. I've added one suggestion to further refactor the socket handling logic to be more idiomatic.

Comment on lines 113 to +129
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.bind(("", port))
sock.close()
except OSError:
return False
finally:
sock.close()

# Check UDP
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
sock.bind(("", port))
sock.close()
return True
except OSError:
return False
finally:
sock.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While using try...finally correctly fixes the socket leak, this can be made more idiomatic and concise by using a with statement. The socket object can be used as a context manager, which will automatically handle closing the socket, even if an exception occurs.

Suggested change
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.bind(("", port))
sock.close()
except OSError:
return False
finally:
sock.close()
# Check UDP
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
sock.bind(("", port))
sock.close()
return True
except OSError:
return False
finally:
sock.close()
try:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.bind(("", port))
except OSError:
return False
# Check UDP
try:
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
sock.bind(("", port))
return True
except OSError:
return False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Socket file descriptor leak in is_port_free() on bind failure

1 participant