Skip to content

Add broker-side telemetry service#1725

Open
willpote wants to merge 11 commits intomainfrom
willpote/telem-pt-2
Open

Add broker-side telemetry service#1725
willpote wants to merge 11 commits intomainfrom
willpote/telem-pt-2

Conversation

@willpote
Copy link
Contributor

Adds telemetry to the broker. The broker now emits structured events (evaluated, committed, proving completed, aggregation completed, fulfilled, failed) through a bounded mpsc channel, and a new TelemetryService correlates them per order into RequestEvaluated and RequestCompleted payloads. These payloads are batched and submitted to the order stream periodically.

Telemetry is enabled by default. Via config it can be disabled, or run in debug mode (which just logs the payloads)

Changes
New TelemetryService and TelemetryHandle in crates/broker/src/telemetry.rs — receives events, correlates per order_id, assembles heartbeat payloads, sends them on a configurable interval
TelemetryMode config (enabled / debug / disabled) controls whether events go to HTTP, get logged locally, or are dropped
Skip codes ([S-001]..[S-015], [S-OP-001]..[S-OP-010]) added to all OrderPricingOutcome::Skip paths so skipped orders have structured reasons in telemetry
OrderRequest now carries received_at_timestamp and exposes request_digest() for the composite order_id
cancel_proof_and_fail and handle_order_failure moved to errors.rs and now emit TelemetryEvent::Failed
Aggregator, proving, submitter, reaper, order monitor, and order picker all wired up to emit telemetry events with timing data (STARK proving, compression, set builder, assessor, etc.)
RequestEvaluated / RequestCompleted structs updated with new fields: order_id, request_digest, proof_type, error_reason, received_at_timestamp, per-stage timing breakdowns, concurrent job counts at start/end
Order-stream: new heartbeat endpoints, Kinesis forwarding, OrderStreamClient in boundless-market
Infra: Kinesis stream + Redshift streaming materialized views via Pulumi, migration scripts
E2E test e2e_telemetry_events verifies debug-mode telemetry fires correctly

Base automatically changed from willpote/telem to main March 12, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant