Skip to content

Flaky: AdvancedShardAwarenessIT.should_not_struggle_to_fill_pools times out in CI #820

@dkropachev

Description

@dkropachev

Description

AdvancedShardAwarenessIT.should_not_struggle_to_fill_pools intermittently fails in CI with a 20-second Awaitility timeout. The test opens 4 concurrent sessions against a 2-node ScyllaDB cluster (with --smp=3) and waits for all connection pools to be fully initialized, but pools fail to fill due to DriverTimeoutException on the protocol OPTIONS handshake.

Error

org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in
com.datastax.oss.driver.core.pool.AdvancedShardAwarenessIT was not fulfilled within 20 seconds.
    at com.datastax.oss.driver.core.pool.AdvancedShardAwarenessIT.should_not_struggle_to_fill_pools(AdvancedShardAwarenessIT.java:239)

Root Cause

Multiple channels fail to initialize with protocol timeout:

WARN  c.d.o.d.i.core.pool.ChannelPool - [s1|/127.0.2.1:19042]  Error while opening new channel
com.datastax.oss.driver.api.core.DriverTimeoutException: [s1|id: 0xf3999fab, L:/127.0.0.1:11669 - R:/127.0.2.1:19042]
  Protocol initialization request, step 1 (OPTIONS): timed out after 5000 ms

Opening 4 sessions simultaneously with multiple channels per shard creates a burst of connection attempts. On CI runners with limited resources, the ScyllaDB node cannot respond to all OPTIONS requests within the 5-second timeout, causing channels to fail and preventing pools from reaching full capacity within 20 seconds.

Environment

  • ScyllaDB version: 2025.4.3 (also seen on LTS versions)
  • CCM config: 2 nodes, --smp=3
  • CI: GitHub Actions ubuntu-latest
  • Test category: IsolatedTests

CI Run

https://github.com/scylladb/java-driver/actions/runs/22542986983/job/65300844148?pr=818

Also observed in base branch (scylla-4.x) CI runs.

Possible Fixes

  • Increase Awaitility timeout from 20s to 60s
  • Increase per-channel protocol init timeout (advanced.connection.init-query-timeout)
  • Reduce the number of concurrent sessions from 4 to 2
  • Add retry/backoff logic for initial connection pool fill

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions