-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
enhancementNew feature or requestNew feature or requestfeatureThis label is in use for minor version incrementsThis label is in use for minor version increments
Milestone
Description
Summary
Add support for distributed stream processing across multiple nodes/processes, enabling horizontal scaling, parallel processing, and fault tolerance through partitioned streams.
Problem Statement
Currently, Cortex.Streams runs entirely within a single process:
- No horizontal scaling: Cannot distribute load across multiple machines
- Single point of failure: If the process crashes, all processing stops
- Limited throughput: Bounded by single-machine resources (CPU, memory, network)
- No partition-based parallelism: Cannot process different keys in parallel across nodes
Current Behavior
// Everything runs in a single process
var stream = StreamBuilder<Order>.CreateNewStream("OrderProcessor")
.Stream(kafkaSource) // Single consumer
.Aggregate(...) // Single aggregator
.Sink(...) // Single sink
.Build();
// If this process crashes, all processing stops
// Cannot scale beyond one machine's capacityImpact
Without distributed mode:
- Cannot handle high-throughput production workloads (millions of events/sec)
- No horizontal scaling for growing data volumes
- Limited fault tolerance (checkpointing helps, but recovery takes time)
- Cannot leverage cloud elasticity
Technical Considerations
-
Exactly-Once with Distribution: Combine with checkpointing for exactly-once across workers.
-
State Migration: When partitions move between workers, state must be migrated.
-
Ordering Guarantees: Events for the same key should be processed in order (within a partition).
-
Backpressure Across Workers: Need flow control for shuffle operations.
-
Network Partitions: Handle split-brain scenarios gracefully.
-
Skewed Partitions: Monitor and alert on hot partitions.
Non-Goals (v1)
- SQL query layer across distributed state
- Automatic partition splitting for hot keys
- Cross-datacenter replication
- Custom partition assignment strategies (beyond key-based)
References
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestfeatureThis label is in use for minor version incrementsThis label is in use for minor version increments