Skip to content

support retries with smaller batch sizes#122

Open
jwijgerd wants to merge 4 commits intomasterfrom
feature/batch-too-large-retries
Open

support retries with smaller batch sizes#122
jwijgerd wants to merge 4 commits intomasterfrom
feature/batch-too-large-retries

Conversation

@jwijgerd
Copy link
Contributor

when receiving an InvalidQueryException while executing a BatchStatement

work in progress.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic retry logic with smaller batch sizes when Cassandra rejects a batch as too large. When an InvalidQueryException is received while executing a BatchStatement, the code now catches this exception, wraps it in a BatchTooLargeException, and recursively splits the batch in half until it succeeds.

Key Changes:

  • Introduced BatchTooLargeException to wrap batch size errors for both Cassandra 2 and 4 drivers
  • Modified ExecutionUtils to detect InvalidQueryException on batch statements and convert to BatchTooLargeException
  • Updated PersistentActorUpdateEventProcessor to recursively split batches in half when too large
  • Added comprehensive unit tests for the batch splitting logic

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
main/backplane-cassandra4/src/main/java/org/elasticsoftware/elasticactors/cassandra4/state/BatchTooLargeException.java New exception class to encapsulate batch size errors with batch metadata
main/backplane-cassandra2/src/main/java/org/elasticsoftware/elasticactors/cassandra2/state/BatchTooLargeException.java New exception class for Cassandra 2 driver with same functionality
main/backplane-cassandra4/src/main/java/org/elasticsoftware/elasticactors/cassandra4/util/ExecutionUtils.java Added InvalidQueryException handling to detect and convert batch size errors
main/backplane-cassandra2/src/main/java/org/elasticsoftware/elasticactors/cassandra2/util/ExecutionUtils.java Added InvalidQueryException handling for Cassandra 2 driver
main/backplane-cassandra4/src/main/java/org/elasticsoftware/elasticactors/cassandra4/state/PersistentActorUpdateEventProcessor.java Refactored event processing and added recursive batch splitting on BatchTooLargeException
main/backplane-cassandra2/src/main/java/org/elasticsoftware/elasticactors/cassandra2/state/PersistentActorUpdateEventProcessor.java Refactored event processing with recursive batch splitting for all batch execution methods
main/backplane-cassandra4/src/test/java/org/elasticsoftware/elasticactors/cassandra4/state/PersistentActorUpdateEventProcessorTest.java Comprehensive test suite for batch splitting scenarios with 2, 3, and 9 events
main/backplane-cassandra4/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker Enables Mockito inline mock maker for testing final classes
main/backplane-cassandra4/pom.xml Added test dependencies for mockito-core, testng, and slf4j-simple

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

} catch(InvalidQueryException e) {
logger.error("InvalidQueryException with message {} on node {} while executing statement, will retry in case of BatchStatement",
e.getMessage(),
Optional.of(e.getEndPoint()).map(endPoint -> endPoint.resolve().toString()).orElse("UNKNOWN"));
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Optional.of() instead of Optional.ofNullable() will throw a NullPointerException if e.getEndPoint() returns null. This should be Optional.ofNullable() to safely handle null values, consistent with other similar code patterns in this file.

Suggested change
Optional.of(e.getEndPoint()).map(endPoint -> endPoint.resolve().toString()).orElse("UNKNOWN"));
Optional.ofNullable(e.getEndPoint()).map(endPoint -> endPoint.resolve().toString()).orElse("UNKNOWN"));

Copilot uses AI. Check for mistakes.
Comment on lines +180 to +190
} catch(BatchTooLargeException e) {
int half = events.size() / 2;
// batch is too large, so we need to split it up
logger.warn(
"Batch of byteSize {} is too large, splitting up in 2 batches. 1 of {} events and 1 of {} events",
e.getBatchSize(),
half,
events.size() - half);
processEvents(events.subList(0, half));
processEvents(events.subList(half, events.size()));
}
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive splitting logic has no depth limit or minimum batch size check. If a single event in the batch causes the "batch too large" error (which shouldn't happen but could if the logic is triggered incorrectly), this will result in infinite recursion when the batch is split down to size 1 and processEvents with size 1 still somehow triggers the error. Consider adding a guard condition or maximum recursion depth.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

processor.process(List.of(event, event));

verify(cqlSession, times(1)).execute(any(BatchStatement.class));
verify(insertStatement, times(4)).bind("key1", "key2", "actorId", ByteBuffer.wrap(new byte[1024]));
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies that insertStatement.bind() is called 4 times for 2 events. However, this appears to be testing implementation details rather than behavior. The bind count includes both the failed batch attempt (2 binds) and the individual statement retries (2 more binds). This test may be brittle and could break if the implementation changes while still maintaining correct behavior.

Copilot uses AI. Check for mistakes.
import com.datastax.oss.driver.api.core.cql.ResultSet;
import com.datastax.oss.driver.api.core.cql.Statement;
import com.datastax.oss.driver.api.core.servererrors.*;
import com.datastax.oss.protocol.internal.request.Batch;
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import com.datastax.oss.protocol.internal.request.Batch is unused and should be removed. This import does not appear to be referenced anywhere in the code.

Suggested change
import com.datastax.oss.protocol.internal.request.Batch;

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +61
} catch(InvalidQueryException e) {
logger.error("InvalidQueryException with message {} on node {} while executing statement, will retry in case of BatchStatement",
e.getMessage(),
Optional.of(e.getCoordinator()).map(node -> node.getEndPoint().resolve().toString()).orElse("UNKNOWN"));
if(statement instanceof BatchStatement batch) {
throw new BatchTooLargeException(batch,batch.computeSizeInBytes(cassandraSession.getContext()));
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes all InvalidQueryException instances when executing a BatchStatement indicate that the batch is too large. However, InvalidQueryException can be thrown for many other reasons (syntax errors, invalid table names, etc.). The code should check the exception message to confirm it's actually a "batch too large" error before converting it to BatchTooLargeException.

Copilot uses AI. Check for mistakes.
private final int batchSize;

public BatchTooLargeException(BatchStatement originalBatch, int batchSize) {
super("BatchStatement of size "+batchSize+" too large to execute");
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space in the error message. Should be: "BatchStatement of size " + batchSize + " too large to execute" (with a space before "too").

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +128
} catch (BatchTooLargeException e) {
int half = events.size() / 2;
// batch is too large, so we need to split it up
logger.warn(
"Batch of byteSize {} is too large, splitting up in 2 batches. 1 of {} events and 1 of {} events",
e.getBatchSize(),
half,
events.size() - half);
processEvents(events.subList(0, half));
processEvents(events.subList(half, events.size()));
}
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive splitting logic has no depth limit or minimum batch size check. If a single event in the batch causes the "batch too large" error (which shouldn't happen but could if the logic is triggered incorrectly), this will result in infinite recursion when the batch is split down to size 1 and processEvents with size 1 still somehow triggers the error. Consider adding a guard condition or maximum recursion depth.

Copilot uses AI. Check for mistakes.
processor.process(List.of(event, event, event));

verify(cqlSession, times(2)).execute(any(BatchStatement.class));
verify(insertStatement, times(6)).bind("key1", "key2", "actorId", ByteBuffer.wrap(new byte[1024]));
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies that insertStatement.bind() is called 6 times for 3 events. However, this appears to be testing implementation details rather than behavior. This test may be brittle and could break if the implementation changes while still maintaining correct behavior. Consider focusing on verifying that all events are successfully processed rather than counting internal method calls.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

processor.process(List.of(event, event, event, event, event, event, event, event, event));

verify(cqlSession, times(3)).execute(any(BatchStatement.class));
verify(insertStatement, times(18)).bind("key1", "key2", "actorId", ByteBuffer.wrap(new byte[1024]));
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies that insertStatement.bind() is called 18 times for 9 events. However, this appears to be testing implementation details rather than behavior. This test may be brittle and could break if the implementation changes while still maintaining correct behavior. Consider focusing on verifying that all events are successfully processed rather than counting internal method calls.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Contributor

Copilot AI commented Dec 26, 2025

@jwijgerd I've opened a new pull request, #163, to work on those changes. Once the pull request is ready, I'll request review from you.

…asticactors/cassandra2/util/ExecutionUtils.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI commented Dec 26, 2025

@jwijgerd I've opened a new pull request, #164, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Dec 26, 2025

@jwijgerd I've opened a new pull request, #165, to work on those changes. Once the pull request is ready, I'll request review from you.

)

* Initial plan

* Refactor test to focus on behavior rather than implementation details

Co-authored-by: jwijgerd <914840+jwijgerd@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jwijgerd <914840+jwijgerd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants