Skip to content

Document zero-copy semantics#2

Open
griffinmilsap wants to merge 2 commits intolow-level-apifrom
zero-copy-semantics
Open

Document zero-copy semantics#2
griffinmilsap wants to merge 2 commits intolow-level-apifrom
zero-copy-semantics

Conversation

@griffinmilsap
Copy link

Adds descriptions of backend transport and notes the zero-copy behavior of messaging (see ezmsg-org/ezmsg#209), highlighting the importance of treating incoming messages as immutable.

Copy link

@KonradPilch KonradPilch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Griffin!

transport-messaging-internals
axisarray

.. important:: `ezmsg` delivers subscriber messages with zero-copy semantics in all cases. Treat incoming messages as immutable, and copy data before mutating or republishing. See :doc:`transport-messaging-internals` for details and examples.
Copy link

@KonradPilch KonradPilch Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right place for this (important) note. At some point in the near future, I think I will expand on the high-level design and move that there. I expect that I will additionally refactor that document to have a separate high-level API explanation (to mirror the low-level API document).

GraphServer, Publisher, Subscriber, Channel
===========================================

- **GraphServer**: a lightweight TCP service that tracks the topic DAG, keeps a registry of publishers/subscribers, and notifies subscribers when their upstream publishers change. It also brokers shared-memory segment creation and attachment.
Copy link

@KonradPilch KonradPilch Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of the terms here like TCP and DAG might need explanation.

My solution: link to a glossary for terms like this (otherwise it would only clutter the text). No need to do this here - for a future PR.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new documentation explaining ezmsg transport internals and clarifies that subscriber delivery is always effectively zero-copy, emphasizing immutability requirements for received messages.

Changes:

  • Adds a new “Transport and Messaging Internals” explainer page describing runtime components, transport selection, and backpressure.
  • Updates pipeline Unit guidance and overview/explainer pages to highlight always-zero-copy subscriber semantics and link to the new explainer.
  • Tweaks low-level API explainer to add context links and clarify performance expectations.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
docs/source/how-tos/pipeline/unit.rst Replaces the old zero_copy bullet with an “always zero-copy” immutability warning and links to internals.
docs/source/explanations/transport-messaging-internals.rst New detailed explainer for GraphServer/Publisher/Subscriber/Channel, transport paths, backpressure, and zero-copy implications.
docs/source/explanations/low-level-api.rst Adds link to internals page and a performance clarification note.
docs/source/explanations/content-explanations.rst Adds the new internals page to the toctree and an overview warning about zero-copy semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. A **Publisher** connects to the **GraphServer**, allocates shared-memory buffers, and registers its topic. It starts a small TCP server so Channels can connect back to it, then reports that server's address to the GraphServer.
2. A **Subscriber** connects to the **GraphServer** and registers its topic. The GraphServer computes upstream publishers from the topic DAG and sends the Subscriber an `UPDATE` list of publisher IDs.
3. For each publisher ID, the Subscriber asks the local **ChannelManager** to register a **Channel**. If a Channel does not yet exist in this process, it is created by:
4. The **Channel** requesting a channel allocation from the GraphServer for the given publisher ID. The GraphServer returns a new channel ID plus the publisher's server address.
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 4 is missing a verb ("The Channel requesting...") and reads ungrammatically. Consider changing it to something like "The Channel requests a channel allocation..." for clarity.

Suggested change
4. The **Channel** requesting a channel allocation from the GraphServer for the given publisher ID. The GraphServer returns a new channel ID plus the publisher's server address.
4. The **Channel** requests a channel allocation from the GraphServer for the given publisher ID. The GraphServer returns a new channel ID plus the publisher's server address.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +32
`ezmsg` uses the fastest transport available per Publisher/Channel pair:

- **Local transport (same process)**: the Publisher pushes the object directly into the Channel (`put_local`), and the Channel stores it in the `MessageCache` without serialization. This is the lowest-overhead path.
- **Shared memory (different process, SHM OK)**: the Publisher serializes the object using `MessageMarshal` (pickle protocol 5 with buffer support), writes it into a ring of shared-memory buffers, and notifies the Channel with a `TX_SHM` message. The Channel reads from shared memory using the message ID and caches the deserialized object.
- **TCP (fallback or forced)**: if SHM is unavailable (attach failed, remote host) or `force_tcp=True`, the Publisher sends a `TX_TCP` payload (header + serialized buffers) directly over the channel socket. The Channel deserializes it and caches the result.
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section uses single-backtick interpreted text for protocol constants and code identifiers (e.g., put_local, MessageCache, MessageMarshal, TX_SHM, TX_TCP). In reStructuredText, inline literals should use double backticks so these render consistently as code and don't get treated as title references.

Copilot uses AI. Check for mistakes.
Zero-copy Semantics and Message Ownership
=========================================

Publishers in `ezmsg` serialize messages to shared memory, and eventually to a module-scoped **MessageCache** which is shared by several Subscribers. Subscribers receive a "zero-copy" view of this message that is:
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an internal inconsistency about MessageCache scope: earlier it says "Every Channel maintains a fixed-size MessageCache", but later the doc describes a "module-scoped MessageCache". Please reconcile/clarify whether the cache is per-Channel or shared at module/process scope to avoid confusing readers about ownership and isolation.

Suggested change
Publishers in `ezmsg` serialize messages to shared memory, and eventually to a module-scoped **MessageCache** which is shared by several Subscribers. Subscribers receive a "zero-copy" view of this message that is:
Publishers in `ezmsg` serialize messages to shared memory, and eventually into the process-local `MessageCache` owned by each Channel. That Channel-level cache is shared by all Subscribers attached to that Channel in the same process. Subscribers receive a "zero-copy" view of this message that is:

Copilot uses AI. Check for mistakes.
- You benefit from the pipeline tooling (graph visualization, CLI integration, etc.).
- You want a structured way to scale across threads/processes without managing it yourself.

.. important:: The low-level API is not any more performant than the high-level API. There is no meaningful performance hit when using the high-level API.
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammar/formatting: "not any more performant" is typically written "not more performant", and there’s a double space between sentences here. Tightening the phrasing will read more professionally.

Suggested change
.. important:: The low-level API is not any more performant than the high-level API. There is no meaningful performance hit when using the high-level API.
.. important:: The low-level API is not more performant than the high-level API. There is no meaningful performance hit when using the high-level API.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments