Skip to content

[Feature]: Distributed Stream Processing Mode #199

@eneshoxha

Description

@eneshoxha

Summary

Add support for distributed stream processing across multiple nodes/processes, enabling horizontal scaling, parallel processing, and fault tolerance through partitioned streams.

Problem Statement

Currently, Cortex.Streams runs entirely within a single process:

  • No horizontal scaling: Cannot distribute load across multiple machines
  • Single point of failure: If the process crashes, all processing stops
  • Limited throughput: Bounded by single-machine resources (CPU, memory, network)
  • No partition-based parallelism: Cannot process different keys in parallel across nodes

Current Behavior

// Everything runs in a single process
var stream = StreamBuilder<Order>.CreateNewStream("OrderProcessor")
    .Stream(kafkaSource)  // Single consumer
    .Aggregate(...)       // Single aggregator
    .Sink(...)            // Single sink
    .Build();

// If this process crashes, all processing stops
// Cannot scale beyond one machine's capacity

Impact

Without distributed mode:

  • Cannot handle high-throughput production workloads (millions of events/sec)
  • No horizontal scaling for growing data volumes
  • Limited fault tolerance (checkpointing helps, but recovery takes time)
  • Cannot leverage cloud elasticity

Technical Considerations

  1. Exactly-Once with Distribution: Combine with checkpointing for exactly-once across workers.

  2. State Migration: When partitions move between workers, state must be migrated.

  3. Ordering Guarantees: Events for the same key should be processed in order (within a partition).

  4. Backpressure Across Workers: Need flow control for shuffle operations.

  5. Network Partitions: Handle split-brain scenarios gracefully.

  6. Skewed Partitions: Monitor and alert on hot partitions.

Non-Goals (v1)

  • SQL query layer across distributed state
  • Automatic partition splitting for hot keys
  • Cross-datacenter replication
  • Custom partition assignment strategies (beyond key-based)

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestfeatureThis label is in use for minor version increments

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions