-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Description
AdvancedShardAwarenessIT.should_not_struggle_to_fill_pools intermittently fails in CI with a 20-second Awaitility timeout. The test opens 4 concurrent sessions against a 2-node ScyllaDB cluster (with --smp=3) and waits for all connection pools to be fully initialized, but pools fail to fill due to DriverTimeoutException on the protocol OPTIONS handshake.
Error
org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in
com.datastax.oss.driver.core.pool.AdvancedShardAwarenessIT was not fulfilled within 20 seconds.
at com.datastax.oss.driver.core.pool.AdvancedShardAwarenessIT.should_not_struggle_to_fill_pools(AdvancedShardAwarenessIT.java:239)
Root Cause
Multiple channels fail to initialize with protocol timeout:
WARN c.d.o.d.i.core.pool.ChannelPool - [s1|/127.0.2.1:19042] Error while opening new channel
com.datastax.oss.driver.api.core.DriverTimeoutException: [s1|id: 0xf3999fab, L:/127.0.0.1:11669 - R:/127.0.2.1:19042]
Protocol initialization request, step 1 (OPTIONS): timed out after 5000 ms
Opening 4 sessions simultaneously with multiple channels per shard creates a burst of connection attempts. On CI runners with limited resources, the ScyllaDB node cannot respond to all OPTIONS requests within the 5-second timeout, causing channels to fail and preventing pools from reaching full capacity within 20 seconds.
Environment
- ScyllaDB version: 2025.4.3 (also seen on LTS versions)
- CCM config: 2 nodes,
--smp=3 - CI: GitHub Actions
ubuntu-latest - Test category:
IsolatedTests
CI Run
https://github.com/scylladb/java-driver/actions/runs/22542986983/job/65300844148?pr=818
Also observed in base branch (scylla-4.x) CI runs.
Possible Fixes
- Increase Awaitility timeout from 20s to 60s
- Increase per-channel protocol init timeout (
advanced.connection.init-query-timeout) - Reduce the number of concurrent sessions from 4 to 2
- Add retry/backoff logic for initial connection pool fill
Related
- AdvancedShardAwarenessIT.should_initialize_all_channels can fail #564 — same test class, different method (
should_initialize_all_channels), different root cause (socket collision)