fix: thread-safety bugs and cleanup from code review#6
Merged
Conversation
added 9 commits
March 10, 2026 14:50
EdgeNode and HostApplication move constructors were reading members in the initializer list before locking other.mutex_ in the body — a data race if the source is accessed concurrently. Move all member transfers into the body under the lock. Also adds missing node_states_ transfer in HostApplication and primary_host_online_ in EdgeNode.
get_node_state() returned a reference_wrapper into mutex-protected map data; get_metric_name() returned a string_view. Both were readable after the lock was released, racing with MQTT callback mutations. get_node_state() now returns a lightweight NodeStateSnapshot (scalars only). get_metric_name() returns std::string. Callers that need alias resolution use get_metric_name() separately.
publish_state_birth/death held mutex_ while publish_raw_message blocked on a 5s future. If the Paho callback thread needed mutex_ to deliver on_message_arrived before firing the send-success callback, deadlock. Copy topic/payload/qos under the lock, release, then publish — matching the pattern EdgeNode already uses.
- Use SEQ_NUMBER_MAX constant consistently instead of literal 256 - Remove dead death_payload_data_ update in rebirth() (connect() overwrites it) - Remove unreachable timestamp fallback in PayloadBuilder::build() - Remove unused includes (memory, thread, algorithm, vector) - Document fire-and-forget QoS 0 semantics on command publish methods - Fix missing newlines at end of files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes from the 2026-03-10 code review. The first four are concurrency/correctness bugs, the rest are cleanup.
other.mutex_before reading members in the initializer listget_node_state()returns a lightweightNodeStateSnapshotby value;get_metric_name()returnsstd::stringpublish_state_birth/publish_state_death"online" : trueinstead of requiring exact"online":truerebirth()death payload,PayloadBuilder::build()timestamp fallback)SEQ_NUMBER_MAXconstant consistentlyTest plan
test_move_constructor_race— concurrent move with reader thread contentiontest_mutex_escaping_refs— snapshot/string ownership survives map mutationtest_state_publish_deadlock— STATE publish under message floodtest_state_json_parsing— 10 whitespace variations of STATE JSON