-
Notifications
You must be signed in to change notification settings - Fork 31
fix: memory safety issues in accl.cpp #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
C-H-A-R-L-O-T-T-E-AI-Consulting-Corp
wants to merge
6
commits into
Xilinx:main
Choose a base branch
from
C-H-A-R-L-O-T-T-E-AI-Consulting-Corp:fix/memory-safety-issues-accl-cpp
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
fix: memory safety issues in accl.cpp #215
C-H-A-R-L-O-T-T-E-AI-Consulting-Corp
wants to merge
6
commits into
Xilinx:main
from
C-H-A-R-L-O-T-T-E-AI-Consulting-Corp:fix/memory-safety-issues-accl-cpp
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Fix exception thrown as pointer (memory leak): Changed `throw new std::runtime_error(...)` to `throw std::runtime_error(...)` in configure_arithmetic() - Fix uninitialized pointer: Initialize `buf = nullptr` in setup_rendezvous_spare_buffers() to prevent undefined behavior if device type doesn't match any condition - Add bounds validation: Check utility_spares.size() before accessing elements to provide clear error messages instead of std::out_of_range - Replace magic number: Extract hardcoded `3` to named constant REQUIRED_SPARE_BUFFERS for clarity Fixes Xilinx#214 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MAX_SESSIONS, SESSION_INDEX_BITS, and SESSION_INDEX_MASK constants to eth_intf.h for centralized configuration - Fix potential buffer overflow in tcp_depacketizer.cpp: - session_id is 16-bit (0-65535) but arrays are 1024 elements - Add session index masking to ensure safe array access - Replace hardcoded 1024 with MAX_SESSIONS constant - Apply same fix to udp_depacketizer.cpp and rdma_depacketizer.cpp The bit masking approach (session_id & SESSION_INDEX_MASK) is: - Hardware efficient (simple AND operation) - Maintains II=1 pipeline performance - Prevents out-of-bounds memory access in FPGA fabric Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Convert CCLO* cclo to std::unique_ptr<CCLO> for automatic memory management - Convert std::vector<Buffer<int8_t>*> to std::vector<std::unique_ptr<Buffer<int8_t>>> for both eager_rx_buffers and utility_spares vectors - Update constructors to use std::make_unique<>() for XRTDevice and SimDevice - Remove explicit delete in destructor (unique_ptr handles cleanup) - Update deinit() to work with unique_ptr (no manual delete needed) - Update setup_eager_rx_buffers() and setup_rendezvous_spare_buffers() to create buffers as unique_ptr and use std::move() for ownership transfer - Update all template buffer creation functions in accl.hpp to use cclo.get() when casting to derived device types Benefits: - Eliminates memory leaks from early returns or exceptions - Makes ownership semantics explicit - Follows modern C++ best practices (Rule of Zero) - No manual memory management required Note: Coyote constructor still accepts raw CoyoteDevice* for backwards compatibility - ownership is transferred to the unique_ptr. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace assert() calls with std::logic_error exceptions in device request handling code. Assertions are compiled out in Release builds, making critical invariant checks no-ops in production. Changes: - xrtdevice.cpp: Replace 2 assertions in FPGARequest::start() and XRTDevice::launch_request() - simdevice.cpp: Replace 2 assertions in SimRequest::start() and SimDevice::launch_request() - coyotedevice.cpp: Replace 2 assertions in CoyoteRequest::start() and CoyoteDevice::launch_request() All assertions checked operation status invariants: - start() requires EXECUTING state - launch_request() requires QUEUED state Replace <cassert> with <stdexcept> includes. Benefits: - Checks remain active in Release builds - Proper exception with descriptive message instead of abort() - Consistent error handling throughout codebase - Easier debugging with clear error messages Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add input validation to all public functions in accl_network_utils to prevent crashes and undefined behavior from invalid parameters. Changes: - Add validate_local_rank() helper function for consistent validation - configure_vnx(): Validate local_rank bounds - configure_tcp(): Validate local_rank bounds - exchange_qp(): Validate master_rank, slave_rank, local_rank bounds and check for empty vectors - configure_cyt_rdma(): Validate local_rank and null device pointer - configure_cyt_tcp(): Validate local_rank and null device pointer - get_ips(): Validate world_size is positive and <= 254 - generate_ranks() (both overloads): - Validate local_rank is non-negative - Validate world_size is positive - Validate local_rank < world_size - Check for port number overflow (start_port + count > 65535) - initialize_accl(): - Validate local_rank bounds - Validate nbufs > 0 - Validate bufsize > 0 - Require xclbin path for non-simulator mode All validation uses descriptive exception messages with function name and parameter values to aid debugging. Exception types: - std::invalid_argument: Invalid parameter values - std::out_of_range: Index out of bounds - std::overflow_error: Port number overflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add configuration constants to constants.hpp:
- MAX_STREAM_ID (246) for stream_put validation
- DEFAULT_RESET_TIMEOUT_MS (100) for soft reset operations
- DEFAULT_OPERATION_TIMEOUT (1000000) for hardware operations
- FLAT_TREE_MAX_COUNT (32KB) for gather/reduce operations
- DEFAULT_GATHER_FLAT_TREE_MAX_FANIN (2)
- DEFAULT_BCAST_FLAT_TREE_MAX_RANKS (3)
- DEFAULT_REDUCE_FLAT_TREE_MAX_RANKS (4)
- Add MAX_DMA_BYTES constant to eth_intf.h (8MB-1)
- Update tcp_depacketizer.cpp, udp_depacketizer.cpp,
rdma_depacketizer.cpp to use the shared constant
This improves code maintainability and self-documentation by
giving meaningful names to configuration values.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
configure_arithmetic()setup_rendezvous_spare_buffers()utility_sparesvectorREQUIRED_SPARE_BUFFERSDetails
Issue 1: Exception Thrown as Pointer
Changed
throw new std::runtime_error(...)tothrow std::runtime_error(...)to prevent memory leak and ensure proper catch behavior.Issue 2: Uninitialized Pointer
Initialize
buf = nullptrand add null check to prevent undefined behavior if device type doesn't match any condition.Issue 3: Missing Bounds Check
Validate
utility_spares.size() >= 3before accessing elements to provide clear error messages.Test plan
Fixes #214
🤖 Generated with Claude Code