|
| 1 | +.. _cellnet_architecture: |
| 2 | + |
| 3 | +FLARE CellNet Architecture |
| 4 | +-------------------------- |
| 5 | + |
| 6 | +.. image:: resources/cellnet.png |
| 7 | + :alt: FLARE CellNet Architecture |
| 8 | + |
| 9 | + |
| 10 | +Purpose and Scope |
| 11 | +################# |
| 12 | + |
| 13 | +CellNet is FLARE's unified communication layer that provides secure, scalable messaging between distributed federated |
| 14 | +learning components. It abstracts away network transport details and provides a consistent API for both small messages and |
| 15 | +large data transfers. |
| 16 | + |
| 17 | +Position in NVFLARE Architecture: CellNet sits between the application layer (Controllers, Executors, Admin commands) and |
| 18 | +the network transport layer (gRPC, TCP, HTTP drivers). All NVFLARE components communicate through CellNet, including: |
| 19 | + |
| 20 | +- **Server-to-client task distribution** |
| 21 | +- **Client-to-server result submission** |
| 22 | +- **peer-to-peer communication** |
| 23 | +- **Admin command execution** |
| 24 | +- **Cross-site auxiliary communication** |
| 25 | +- **Job deployment and management** |
| 26 | + |
| 27 | + |
| 28 | +Key Design Goals: |
| 29 | +################# |
| 30 | + |
| 31 | +- **Unified API**: Single interface for both small messages and large data streams |
| 32 | +- **Transport Agnostic**: Supports multiple network protocols (gRPC, TCP, HTTP) |
| 33 | +- **Hierarchical Addressing**: FQCN-based routing for multi-level cell hierarchies |
| 34 | +- **Secure Communication**: Built-in encryption and authentication |
| 35 | +- **Flow Control**: Automatic chunking and flow control for large transfers |
| 36 | + |
| 37 | +Three-Layer Architecture: |
| 38 | +######################### |
| 39 | + |
| 40 | +- **CoreCell**: Basic message routing, connection management, security |
| 41 | +- **StreamCell**: Large data streaming with chunking and flow control |
| 42 | +- **Cell**: High-level request/reply patterns with automatic channel detection |
| 43 | + |
| 44 | +Layered Cell Architecture |
| 45 | +######################### |
| 46 | + |
| 47 | + |
| 48 | +Three-Layer Design |
| 49 | +^^^^^^^^^^^^^^^^^^ |
| 50 | + |
| 51 | +The CellNet architecture consists of three layers, each extending the previous: |
| 52 | + |
| 53 | +Layer 1: **CoreCell** - Basic Message Infrastructure |
| 54 | +provides fundamental messaging infrastructure: |
| 55 | + |
| 56 | +**Key Responsibilities**: |
| 57 | + |
| 58 | +- **Message Handling** - Routes messages to appropriate handlers based on channel/topic |
| 59 | +- **Connection Management** - Manages listeners (incoming) and connectors (outgoing) |
| 60 | +- **Callback Registry** - Stores message handlers in req_reg: Registry |
| 61 | +- **Agent Tracking** - Maintains agents: Dict[str, CellAgent] for remote cells |
| 62 | +- **Request Tracking** - Tracks pending requests in waiters: Dict[str, _Waiter] |
| 63 | +- **Security** - Delegates to credential_manager: CredentialManager for encryption |
| 64 | + |
| 65 | +**Core Methods**: |
| 66 | + |
| 67 | +- **send_request**(channel, target, topic, request, timeout, ...) - Send message and wait for reply |
| 68 | +- **fire_and_forget**(channel, topic, targets, message, ...) - Send without waiting |
| 69 | +- **broadcast_request**(channel, topic, targets, request, ...) - Send to multiple targets |
| 70 | +- **register_request_cb**(channel, topic, cb, ...) - Register callback for channel/topic |
| 71 | +
|
| 72 | +Layer 2: **StreamCell** - Large Data Transfer |
| 73 | + |
| 74 | +The StreamCell adds large data transfer capabilities on top of CoreCell: |
| 75 | + |
| 76 | +**Key Components**: |
| 77 | + |
| 78 | +- **cell**: CoreCell - Wrapped CoreCell for basic messaging |
| 79 | +- **byte_streamer**: ByteStreamer - Sends data as chunked streams |
| 80 | +- **byte_receiver**: ByteReceiver - Receives and reassembles chunks |
| 81 | +- **blob_streamer**: BlobStreamer - Optimized for in-memory BLOBs |
| 82 | + |
| 83 | +**Streaming Methods**: |
| 84 | + |
| 85 | +- **send_stream** (channel, topic, target, message, ...) - Send byte stream with flow control |
| 86 | +- **send_blob** (channel, topic, target, message, ...) - Send BLOB (fits in memory) |
| 87 | +- **register_stream_cb** (channel, topic, stream_cb, ...) - Register stream receiver |
| 88 | +- **register_blob_cb** (channel, topic, blob_cb, ...) - Register BLOB receiver |
| 89 | + |
| 90 | +**Streaming Protocol**: |
| 91 | + |
| 92 | +- Automatic chunking into configurable chunk sizes (default 1MB) |
| 93 | +- Flow control with sliding window and ACKs |
| 94 | +- Progress tracking via StreamFuture |
| 95 | + |
| 96 | +Layer 3: **Cell** - Intelligent Request/Reply |
| 97 | + |
| 98 | +The **Cell** class provides unified interface for streaming and non-streaming messages: |
| 99 | + |
| 100 | +**Key Features**: |
| 101 | + |
| 102 | +1. **Dynamic Method Dispatch**: |
| 103 | + |
| 104 | +- Intercept method calls Checks if channel requires streaming via _is_stream_channel() |
| 105 | +- Routes to appropriate implementation: |
| 106 | +- Stream channels → _broadcast_request(), _send_request(), etc. |
| 107 | +- Non-stream channels → core_cell.broadcast_request(), etc. |
| 108 | + |
| 109 | +2. **Channel Classification**: |
| 110 | + |
| 111 | +**Excluded Channels**: |
| 112 | + |
| 113 | + - CellChannel.CLIENT_MAIN - Admin commands |
| 114 | + - CellChannel.SERVER_MAIN** - Task distribution |
| 115 | + - CellChannel.RETURN_ONLY** - Internal replies |
| 116 | + - CellChannel.CLIENT_COMMAND** - Client commands |
| 117 | + - Other internal channels |
| 118 | + |
| 119 | +3. **Request Tracking**: |
| 120 | + |
| 121 | +- Maintains requests_dict: Dict[str, SimpleWaiter] for pending requests |
| 122 | +- SimpleWaiter tracks request state and receiving progress |
| 123 | +- Reply handling via _process_reply() |
| 124 | + |
| 125 | +4. **Callback Adaptation**: |
| 126 | + |
| 127 | +- Adapter class wraps application callbacks for streaming |
| 128 | +- Handles encoding/decoding of stream payloads |
| 129 | +- Sends replies back via RETURN_ONLY channel |
| 130 | + |
| 131 | +5. **FQCN: Fully Qualified Cell Name**: |
| 132 | + |
| 133 | +Every cell is identified by a Fully Qualified Cell Name (FQCN), which is a dot-separated hierarchical name: |
| 134 | + |
| 135 | +<site_name>[.<job_id>[.<rank>]] |
| 136 | + |
| 137 | +6. **End-to-end encryption** |
| 138 | + |
| 139 | +Message Structure and Addressing |
| 140 | +############################### |
| 141 | + |
| 142 | +Channel and Topic Addressing |
| 143 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 144 | + |
| 145 | +F3 CellNet routes messages using a two-level addressing scheme: channel and topic. |
| 146 | +This is stored in message headers: |
| 147 | + |
| 148 | +Title: Channel/Topic Routing in CoreCell |
| 149 | + |
| 150 | + |
| 151 | +Predefined Channels (CellChannel constants): |
| 152 | + |
| 153 | +.. list-table:: **Predefined Channels** |
| 154 | + :header-rows: 1 |
| 155 | + :widths: 35 25 40 |
| 156 | + |
| 157 | + * - Constant |
| 158 | + - Value |
| 159 | + - Purpose |
| 160 | + * - CellChannel.CLIENT_MAIN |
| 161 | + - "admin" |
| 162 | + - Admin commands |
| 163 | + * - CellChannel.SERVER_MAIN |
| 164 | + - "task" |
| 165 | + - Task distribution |
| 166 | + * - CellChannel.AUX_COMMUNICATION |
| 167 | + - "aux_communication" |
| 168 | + - Application-defined |
| 169 | + * - CellChannel.RETURN_ONLY |
| 170 | + - "return_only" |
| 171 | + - Internal reply routing |
| 172 | + * - CellChannel.SERVER_COMMAND |
| 173 | + - "server_command" |
| 174 | + - Server commands |
| 175 | + |
| 176 | + |
| 177 | +Communication Patterns |
| 178 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 179 | +Request-Reply Pattern -- send request and wait for reply |
| 180 | +Fire-and-Forget Pattern -- send message without waiting for reply |
| 181 | +Broadcast Pattern -- send to multiple targets |
| 182 | + |
| 183 | +Streaming Components Overview |
| 184 | +############################# |
| 185 | + |
| 186 | + |
| 187 | +The streaming system is organized into sender components, receiver components,and stream abstractions: |
| 188 | + |
| 189 | +Key Streaming Classes: |
| 190 | + |
| 191 | +.. list-table:: **Key Streaming Classes** |
| 192 | + :header-rows: 1 |
| 193 | + :widths: 25 40 |
| 194 | + |
| 195 | + * - Class |
| 196 | + - Purpose |
| 197 | + * - ByteStreamer |
| 198 | + - Sends byte streams as chunks |
| 199 | + * - ByteReceiver |
| 200 | + - Receives and reassembles chunks |
| 201 | + * - BlobStreamer |
| 202 | + - Wraps blobs for streaming |
| 203 | + * - TxTask |
| 204 | + - Per-stream sending task |
| 205 | + * - RxTask |
| 206 | + - Per-stream receiving task |
| 207 | + |
| 208 | + |
| 209 | +Performance and Statistics |
| 210 | +########################## |
| 211 | +Statistics Collection |
| 212 | +CellNet includes comprehensive statistics collection for monitoring and debugging: |
| 213 | +Statistics are collected via StatsPoolManager with categories for different operation types and cell FQCNs. |
| 214 | + |
| 215 | + |
0 commit comments