[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations by avichaym · Pull Request #533 · apache/flink-agents

avichaym · 2026-02-10T16:27:13Z

Linked issue: #524

Depends on #534 — please merge that first.

Purpose of change

Add Amazon OpenSearch and S3 Vectors as vector store providers.

OpenSearchVectorStore — Supports Serverless (AOSS) and Service domains, IAM/basic auth, implements CollectionManageableVectorStore for Long-Term Memory, KNN search with filter support, chunked bulk writes
S3VectorsVectorStore — S3 Vectors SDK, PutVectors chunked at 500 (API limit)

Both override add() for batch embedding optimization.

New modules: integrations/vector-stores/opensearch/, integrations/vector-stores/s3vectors/

Tests

Unit tests: OpenSearch (4), S3 Vectors (2)
Integration tests gated by env vars (OPENSEARCH_ENDPOINT, S3V_BUCKET): collection CRUD, document CRUD, filtered query
End-to-end validated with RAG and Long-Term Memory demos against real OpenSearch domain and S3 Vectors bucket

API

No public API changes. New integration modules only.

Documentation

doc-needed
doc-not-needed
doc-included

…grations Add two new integration modules for Amazon Bedrock: - Chat model using the Converse API with native tool calling support, SigV4 auth via DefaultCredentialsProvider, and token metrics reporting. Supports all Bedrock models accessible via Converse API. - Embedding model using Titan Text Embeddings V2 via InvokeModel. Batch embed(List<String>) parallelizes via configurable thread pool (embed_concurrency parameter, default 4). Includes unit tests for constructors, parameter handling, and inheritance.

…ockEmbeddingModelConnection

github-actions · 2026-02-10T16:38:19Z

@avichaym Please add the following content to your PR description and select a checkbox:

- [ ] `doc-needed` 
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-included`

…avadocs

…ences - Add exponential backoff retry (MAX_RETRIES=5) for ThrottlingException, ServiceUnavailableException, ModelErrorException, 429, 503 — consistent with BedrockEmbeddingModelConnection in this PR. - Remove {..} JSON extraction fallback from stripMarkdownFences that could corrupt normal text responses containing braces. - Only apply markdown fence stripping on non-tool-call responses. - Add 5 unit tests for stripMarkdownFences covering: text with braces, clean JSON, json fences, plain fences, and null input.

…grations Add two new vector store integration modules: - OpenSearch: supports Serverless (AOSS) and Service domains, IAM (SigV4) or basic auth. Implements CollectionManageableVectorStore. ANN search via knn query with ef_search, min_score, and filter_query support. Bulk writes chunked by configurable max_bulk_mb. - S3 Vectors: uses S3 Vectors SDK for PutVectors/QueryVectors/ GetVectors/DeleteVectors. PutVectors chunked at 500 (API limit). Both override add() for batch embedding via embed(List<String>). Includes unit tests and integration tests (auto-enabled via OPENSEARCH_ENDPOINT / S3V_BUCKET environment variables). Validated against real OpenSearch domain and S3 Vectors bucket.

…docs Bedrock (chat + embedding): - Add close() to release AWS SDK clients and thread pools - Wire max_tokens through BedrockChatModelSetup into InferenceConfiguration - Add retry jitter to BedrockEmbeddingModelConnection - Add typed getConnection() override to BedrockEmbeddingModelSetup - Document stripMarkdownFences necessity and future work OpenSearch vector store: - Add close() to release SdkHttpClient - Cache DefaultCredentialsProvider (was creating new instance per request) - Add constructor validation for required endpoint/index params - Add limit support in get() via extraArgs - Add TODO for Aws4Signer deprecation and batch add() dedup S3 Vectors vector store: - Add close() to release S3VectorsClient - Add constructor validation for required vector_bucket/vector_index params - size() now throws UnsupportedOperationException instead of returning -1 - Add TODO for batch add() dedup All files: expand wildcard imports, add usage example Javadocs

OpenSearchVectorStore: - Add retry with exponential backoff for 429/502/503 in executeRequest() - Only ignore 404s in getCollection/deleteCollection (not all exceptions) - Close credentialsProvider in close() S3VectorsVectorStore: - Add retry with backoff for putVectors (ThrottlingException, 429, 503) Consistent with retry patterns in BedrockChatModelConnection and BedrockEmbeddingModelConnection.

wenjin272

Hi, @avichaym, thanks for your work. LGTM, just a few minor comments.

wenjin272 · 2026-02-26T08:08:36Z

...java/org/apache/flink/agents/integrations/vectorstores/opensearch/OpenSearchVectorStore.java

+     * Batch-embeds all documents in a single call, then delegates to addEmbedding.
+     *
+     * <p>TODO: This batch embedding logic is duplicated in S3VectorsVectorStore. Consider
+     * extracting to BaseVectorStore in a follow-up (would also benefit ElasticsearchVectorStore).


+1 for implementing this batch embedding logic in BaseVectorStore directly.

wenjin272 · 2026-02-26T09:02:11Z

...java/org/apache/flink/agents/integrations/vectorstores/opensearch/OpenSearchVectorStore.java

+
+        this.index = descriptor.getArgument("index");
+        if (this.index == null || this.index.isBlank()) {
+            throw new IllegalArgumentException("index is required for OpenSearchVectorStore");


Could index be null but indicate index in each operation?

wenjin272 · 2026-02-26T09:08:20Z

...n/java/org/apache/flink/agents/integrations/vectorstores/s3vectors/S3VectorsVectorStore.java

+            @Nullable List<String> ids, @Nullable String collection, Map<String, Object> extraArgs)
+            throws IOException {
+        if (ids == null || ids.isEmpty()) {
+            return;


In current design, if ids is not provided, vector store should get/delete all the documents in the collection. The behavior of s3vectors is inconsistent, we need throw exception when ids is not provided for s3vectors or emphasize this point in the documentation.

wenjin272 · 2026-02-26T09:18:16Z

...java/org/apache/flink/agents/integrations/vectorstores/opensearch/OpenSearchVectorStore.java

+            body.put("size", ids.size());
+            return parseHits(executeRequest("POST", "/" + idx + "/_search", body.toString()));
+        }
+        int limit = 10000;


Maybe a static variable is better?

wenjin272 · 2026-02-26T09:24:34Z

...n/java/org/apache/flink/agents/integrations/vectorstores/s3vectors/S3VectorsVectorStore.java

+        this.client =
+                S3VectorsClient.builder()
+                        .region(Region.of(regionStr != null ? regionStr : "us-east-1"))
+                        .credentialsProvider(DefaultCredentialsProvider.create())


According to https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html#create(), create is deprecated, builder().build() is better.

Avichay Marciano added 2 commits February 10, 2026 08:02

Add exponential backoff retry for throttling and model errors in Bedr…

481d7a9

…ockEmbeddingModelConnection

github-actions bot added priority/major Default priority of the PR or issue. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. labels Feb 10, 2026

avichaym changed the title ~~[FLINK-AGENTS-523] Add Amazon Bedrock chat model and embedding model integrations~~ [FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations Feb 10, 2026

avichaym force-pushed the feature/aws-vector-stores branch from 56b9836 to 9f4f768 Compare February 10, 2026 16:37

github-actions bot added the doc-label-missing The Bot applies this label either because none or multiple labels were provided. label Feb 10, 2026

github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Feb 10, 2026

wenjin272 added doc-needed Your PR changes impact docs. and removed doc-not-needed Your PR changes do not impact docs labels Feb 12, 2026

Improve Bedrock integration: add close(), max_tokens, retry jitter, J…

34d38c2

…avadocs

avichaym force-pushed the feature/aws-vector-stores branch from 9f4f768 to c492b13 Compare February 13, 2026 22:25

Avichay Marciano added 4 commits February 19, 2026 09:47

Apply spotless formatting to pass CI code-style check

dd77d50

avichaym force-pushed the feature/aws-vector-stores branch from c492b13 to dd77d50 Compare February 19, 2026 09:08

xintongsong requested a review from wenjin272 February 24, 2026 09:31

wenjin272 reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations#533

[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations#533
avichaym wants to merge 8 commits intoapache:mainfrom
avichaym:feature/aws-vector-stores

avichaym commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

wenjin272 left a comment

Uh oh!

wenjin272 Feb 26, 2026

Uh oh!

wenjin272 Feb 26, 2026

Uh oh!

wenjin272 Feb 26, 2026

Uh oh!

wenjin272 Feb 26, 2026

Uh oh!

wenjin272 Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avichaym commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of change

Tests

API

Documentation

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

wenjin272 left a comment

Choose a reason for hiding this comment

Uh oh!

wenjin272 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

wenjin272 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

wenjin272 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

wenjin272 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

wenjin272 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

avichaym commented Feb 10, 2026 •

edited

Loading