Skip to content

S3 + ELASTIC#46

Closed
aacruzgon wants to merge 1 commit intoHeliosSoftware:mainfrom
aacruzgon:persistance(feature)/s3+ElasticSearch
Closed

S3 + ELASTIC#46
aacruzgon wants to merge 1 commit intoHeliosSoftware:mainfrom
aacruzgon:persistance(feature)/s3+ElasticSearch

Conversation

@aacruzgon
Copy link
Contributor

Summary

This PR adds S3 + Elasticsearch as a first-class polyglot persistence mode in HFS via CompositeStorage, exposed as:

  • HFS_STORAGE_BACKEND=s3-elasticsearch
  • aliases: s3-es, objectstore-elasticsearch

In this mode:

  • S3 is canonical for CRUD/read/version/history.
  • Elasticsearch is authoritative for search (_search, _text, _content, counts, includes/revincludes).
  • No S3 object-scan search fallback is used.

Why

We already supported SQL+Elasticsearch offloading (sqlite-elasticsearch, postgres-elasticsearch) and had an S3 storage backend. This PR completes the polyglot model by enabling object-storage-first persistence with dedicated search indexing, aligned with the persistence architecture goals.


What Changed

1) First-class backend wiring (s3-elasticsearch)

  • Added startup/config wiring so HFS boots with:
    • primary backend: S3
    • search backend: Elasticsearch
    • composite sync mode: synchronous write-through
  • Added shared SearchParameter registry bootstrap for ES in this mode.
  • Expanded S3 env parsing for both s3 and s3-elasticsearch startup paths.

2) Composite routing guarantees

  • Updated composite search routing so when primary is S3:
    • all search and count paths are delegated to Elasticsearch
    • S3 SearchProvider capability is not required

3) Sync semantics and idempotency

  • Sync events now carry version metadata.
  • S3->ES sync is version-aware and replay-safe:
    • duplicate events are no-ops
    • stale versions are ignored
  • Delete propagation is idempotent:
    • missing ES docs are treated as already converged
  • Improved sync observability/logging with tenant/resource/backend context.

4) Elasticsearch backend hardening

  • Added tenant/resource index segment sanitization for safe index naming.
  • Hardened create/update/upsert behavior around meta.versionId for conflict-safe indexing.

5) Reindex/repair path

  • Added S3->ES reindex utility to rebuild ES from S3 current objects.
  • Supports batch size, optional clear-existing, and optional resource-type filters.
  • Added startup trigger for operational recovery/rebuild flows.

6) Documentation

  • Updated root and persistence READMEs with:
    • s3-elasticsearch mode usage
    • env vars
    • write/sync semantics
    • local MinIO + Elasticsearch dev recipe
  • Added milestone and implementation docs under:
    • crates/persistence/docs/s3-elasticsearch/M1.md
    • crates/persistence/docs/s3-elasticsearch/M2.md
    • crates/persistence/docs/s3-elasticsearch/M3.md
    • crates/persistence/docs/s3-elasticsearch/M4.md
    • crates/persistence/docs/s3-elasticsearch/IMPLEMENTATION.md

Config Surface Added/Used

  • HFS_STORAGE_BACKEND=s3-elasticsearch
  • S3: HFS_S3_TENANCY_MODE, HFS_S3_BUCKET, HFS_S3_TENANT_BUCKET_MAP, HFS_S3_DEFAULT_SYSTEM_BUCKET, HFS_S3_REGION, HFS_S3_PREFIX, HFS_S3_ENDPOINT_URL, HFS_S3_FORCE_PATH_STYLE, HFS_S3_ALLOW_HTTP, HFS_S3_VALIDATE_BUCKETS
  • ES: HFS_ELASTICSEARCH_NODES, HFS_ELASTICSEARCH_INDEX_PREFIX, HFS_ELASTICSEARCH_USERNAME, HFS_ELASTICSEARCH_PASSWORD
  • Reindex: HFS_S3_ES_REINDEX_ON_STARTUP, HFS_S3_ES_REINDEX_BATCH_SIZE, HFS_S3_ES_REINDEX_CLEAR_EXISTING, HFS_S3_ES_REINDEX_RESOURCE_TYPES

Testing

Added/updated coverage for:

  • backend mode parsing/display
  • S3 env parsing and startup option parsing
  • composite routing correctness for S3 primary + ES search
  • ES idempotency/stale-update/delete replay behavior
  • MinIO + Elasticsearch integration scenarios:
    • CRUD roundtrip (reads from S3)
    • search visibility via ES after write-through
    • delete propagation
    • tenant isolation
    • reindex rebuild after ES wipe

Validation run:

  • cargo fmt --all
  • cargo test -p helios-rest
  • cargo test -p helios-hfs --features s3,elasticsearch
  • cargo test -p helios-persistence --features s3,elasticsearch --test composite_s3_elasticsearch_tests
  • cargo test -p helios-persistence --features s3,elasticsearch --test elasticsearch_tests -- --skip es_integration
  • cargo test -p helios-persistence --features s3,elasticsearch --test minio_s3_elasticsearch_tests
  • cargo clippy -p helios-persistence -p helios-rest -p helios-hfs --all-targets --features s3,elasticsearch -- -D warnings ...

Behavior and Compatibility Notes

  • This is additive and preserves existing sqlite-elasticsearch / postgres-elasticsearch behavior.
  • Primary write success is preserved even if ES sync fails; failures are logged and recoverable via reindex.
  • ES indexes current searchable state; history/version reads remain sourced from S3.

@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@aacruzgon aacruzgon closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant