Skip to content

Make sure system.peers and peers_v2 match cluster metadata #4630

Open
minal-kyada wants to merge 1 commit intoapache:trunkfrom
minal-kyada:cassandra-168065336
Open

Make sure system.peers and peers_v2 match cluster metadata #4630
minal-kyada wants to merge 1 commit intoapache:trunkfrom
minal-kyada:cassandra-168065336

Conversation

@minal-kyada
Copy link

@minal-kyada minal-kyada commented Feb 24, 2026

Description:
system.peers and system.peers_v2 can drift out of sync with ClusterMetadata, causing clients who use older C* version and tools that read these legacy tables to observe incorrect cluster topology.

Adds SystemPeersValidator which reconciles both peer tables against ClusterMetadata on startup- removing stale entries for nodes no longer in the cluster and repairing missing entries for joined nodes. Also exposes this as a JMX operation via StorageServiceMBean so operators can trigger it on demand without restarting.

Quick diagram of what the patch is about:
image

patch by @minal-kyada; reviewed by @ifesdjeen @krummas for CASSANDRA-21187


public static void validateAndRepair(ClusterMetadata metadata)
{
if (metadata != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably throw if it's null;

also, a small nit: I wouldn't nest entire method within if's brackets, i would early-exit on unmatched condition, and unindent the rest of the method.

private static Set<InetAddressAndPort> getPeersV2Entries()
{
Set<InetAddressAndPort> entries = new HashSet<>();
String query = String.format("SELECT peer, peer_port FROM %s.%s", SchemaConstants.SYSTEM_KEYSPACE_NAME, PEERS_V2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to validate all fields of the table? In updateLegacyPeerTable, we're updating more than just these two fields:

            QueryProcessor.executeInternal(String.format(peers_v2_query, SYSTEM_KEYSPACE_NAME, PEERS_V2),
                                           addresses.broadcastAddress.getAddress(), addresses.broadcastAddress.getPort(),
                                           addresses.broadcastAddress.getAddress(), addresses.broadcastAddress.getPort(),
                                           addresses.nativeAddress.getAddress(), addresses.nativeAddress.getPort(),
                                           location.datacenter, location.rack,
                                           nodeId.toUUID(),
                                           next.directory.version(nodeId).cassandraVersion.toString(),
                                           next.schema.getVersion(),
                                           tokens);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants