Skip to content

Commit 7dedbc5

Browse files
Add System architecture and Security Architecture documentations [skip ci] (#3794)
### Description 1. Add system architecture 2. Add cellnet architecture 3. Add Security Architecture ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated.
1 parent 2281c4e commit 7dedbc5

File tree

8 files changed

+882
-10
lines changed

8 files changed

+882
-10
lines changed

docs/cellnet_architecture.rst

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
.. _cellnet_architecture:
2+
3+
FLARE CellNet Architecture
4+
--------------------------
5+
6+
.. image:: resources/cellnet.png
7+
:alt: FLARE CellNet Architecture
8+
9+
10+
Purpose and Scope
11+
#################
12+
13+
CellNet is FLARE's unified communication layer that provides secure, scalable messaging between distributed federated
14+
learning components. It abstracts away network transport details and provides a consistent API for both small messages and
15+
large data transfers.
16+
17+
Position in NVFLARE Architecture: CellNet sits between the application layer (Controllers, Executors, Admin commands) and
18+
the network transport layer (gRPC, TCP, HTTP drivers). All NVFLARE components communicate through CellNet, including:
19+
20+
- **Server-to-client task distribution**
21+
- **Client-to-server result submission**
22+
- **peer-to-peer communication**
23+
- **Admin command execution**
24+
- **Cross-site auxiliary communication**
25+
- **Job deployment and management**
26+
27+
28+
Key Design Goals:
29+
#################
30+
31+
- **Unified API**: Single interface for both small messages and large data streams
32+
- **Transport Agnostic**: Supports multiple network protocols (gRPC, TCP, HTTP)
33+
- **Hierarchical Addressing**: FQCN-based routing for multi-level cell hierarchies
34+
- **Secure Communication**: Built-in encryption and authentication
35+
- **Flow Control**: Automatic chunking and flow control for large transfers
36+
37+
Three-Layer Architecture:
38+
#########################
39+
40+
- **CoreCell**: Basic message routing, connection management, security
41+
- **StreamCell**: Large data streaming with chunking and flow control
42+
- **Cell**: High-level request/reply patterns with automatic channel detection
43+
44+
Layered Cell Architecture
45+
#########################
46+
47+
48+
Three-Layer Design
49+
^^^^^^^^^^^^^^^^^^
50+
51+
The CellNet architecture consists of three layers, each extending the previous:
52+
53+
Layer 1: **CoreCell** - Basic Message Infrastructure
54+
provides fundamental messaging infrastructure:
55+
56+
**Key Responsibilities**:
57+
58+
- **Message Handling** - Routes messages to appropriate handlers based on channel/topic
59+
- **Connection Management** - Manages listeners (incoming) and connectors (outgoing)
60+
- **Callback Registry** - Stores message handlers in req_reg: Registry
61+
- **Agent Tracking** - Maintains agents: Dict[str, CellAgent] for remote cells
62+
- **Request Tracking** - Tracks pending requests in waiters: Dict[str, _Waiter]
63+
- **Security** - Delegates to credential_manager: CredentialManager for encryption
64+
65+
**Core Methods**:
66+
67+
- **send_request**(channel, target, topic, request, timeout, ...) - Send message and wait for reply
68+
- **fire_and_forget**(channel, topic, targets, message, ...) - Send without waiting
69+
- **broadcast_request**(channel, topic, targets, request, ...) - Send to multiple targets
70+
- **register_request_cb**(channel, topic, cb, ...) - Register callback for channel/topic
71+
72+
Layer 2: **StreamCell** - Large Data Transfer
73+
74+
The StreamCell adds large data transfer capabilities on top of CoreCell:
75+
76+
**Key Components**:
77+
78+
- **cell**: CoreCell - Wrapped CoreCell for basic messaging
79+
- **byte_streamer**: ByteStreamer - Sends data as chunked streams
80+
- **byte_receiver**: ByteReceiver - Receives and reassembles chunks
81+
- **blob_streamer**: BlobStreamer - Optimized for in-memory BLOBs
82+
83+
**Streaming Methods**:
84+
85+
- **send_stream** (channel, topic, target, message, ...) - Send byte stream with flow control
86+
- **send_blob** (channel, topic, target, message, ...) - Send BLOB (fits in memory)
87+
- **register_stream_cb** (channel, topic, stream_cb, ...) - Register stream receiver
88+
- **register_blob_cb** (channel, topic, blob_cb, ...) - Register BLOB receiver
89+
90+
**Streaming Protocol**:
91+
92+
- Automatic chunking into configurable chunk sizes (default 1MB)
93+
- Flow control with sliding window and ACKs
94+
- Progress tracking via StreamFuture
95+
96+
Layer 3: **Cell** - Intelligent Request/Reply
97+
98+
The **Cell** class provides unified interface for streaming and non-streaming messages:
99+
100+
**Key Features**:
101+
102+
1. **Dynamic Method Dispatch**:
103+
104+
- Intercept method calls Checks if channel requires streaming via _is_stream_channel()
105+
- Routes to appropriate implementation:
106+
- Stream channels → _broadcast_request(), _send_request(), etc.
107+
- Non-stream channels → core_cell.broadcast_request(), etc.
108+
109+
2. **Channel Classification**:
110+
111+
**Excluded Channels**:
112+
113+
- CellChannel.CLIENT_MAIN - Admin commands
114+
- CellChannel.SERVER_MAIN** - Task distribution
115+
- CellChannel.RETURN_ONLY** - Internal replies
116+
- CellChannel.CLIENT_COMMAND** - Client commands
117+
- Other internal channels
118+
119+
3. **Request Tracking**:
120+
121+
- Maintains requests_dict: Dict[str, SimpleWaiter] for pending requests
122+
- SimpleWaiter tracks request state and receiving progress
123+
- Reply handling via _process_reply()
124+
125+
4. **Callback Adaptation**:
126+
127+
- Adapter class wraps application callbacks for streaming
128+
- Handles encoding/decoding of stream payloads
129+
- Sends replies back via RETURN_ONLY channel
130+
131+
5. **FQCN: Fully Qualified Cell Name**:
132+
133+
Every cell is identified by a Fully Qualified Cell Name (FQCN), which is a dot-separated hierarchical name:
134+
135+
<site_name>[.<job_id>[.<rank>]]
136+
137+
6. **End-to-end encryption**
138+
139+
Message Structure and Addressing
140+
###############################
141+
142+
Channel and Topic Addressing
143+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
144+
145+
F3 CellNet routes messages using a two-level addressing scheme: channel and topic.
146+
This is stored in message headers:
147+
148+
Title: Channel/Topic Routing in CoreCell
149+
150+
151+
Predefined Channels (CellChannel constants):
152+
153+
.. list-table:: **Predefined Channels**
154+
:header-rows: 1
155+
:widths: 35 25 40
156+
157+
* - Constant
158+
- Value
159+
- Purpose
160+
* - CellChannel.CLIENT_MAIN
161+
- "admin"
162+
- Admin commands
163+
* - CellChannel.SERVER_MAIN
164+
- "task"
165+
- Task distribution
166+
* - CellChannel.AUX_COMMUNICATION
167+
- "aux_communication"
168+
- Application-defined
169+
* - CellChannel.RETURN_ONLY
170+
- "return_only"
171+
- Internal reply routing
172+
* - CellChannel.SERVER_COMMAND
173+
- "server_command"
174+
- Server commands
175+
176+
177+
Communication Patterns
178+
^^^^^^^^^^^^^^^^^^^^^^
179+
Request-Reply Pattern -- send request and wait for reply
180+
Fire-and-Forget Pattern -- send message without waiting for reply
181+
Broadcast Pattern -- send to multiple targets
182+
183+
Streaming Components Overview
184+
#############################
185+
186+
187+
The streaming system is organized into sender components, receiver components,and stream abstractions:
188+
189+
Key Streaming Classes:
190+
191+
.. list-table:: **Key Streaming Classes**
192+
:header-rows: 1
193+
:widths: 25 40
194+
195+
* - Class
196+
- Purpose
197+
* - ByteStreamer
198+
- Sends byte streams as chunks
199+
* - ByteReceiver
200+
- Receives and reassembles chunks
201+
* - BlobStreamer
202+
- Wraps blobs for streaming
203+
* - TxTask
204+
- Per-stream sending task
205+
* - RxTask
206+
- Per-stream receiving task
207+
208+
209+
Performance and Statistics
210+
##########################
211+
Statistics Collection
212+
CellNet includes comprehensive statistics collection for monitoring and debugging:
213+
Statistics are collected via StatsPoolManager with categories for different operation types and cell FQCNs.
214+
215+

docs/conf.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,12 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
4444
# -- Project information -----------------------------------------------------
4545

4646
project = "NVIDIA FLARE"
47-
copyright = "2024, NVIDIA"
47+
copyright = "2025 NVIDIA"
4848
author = "NVIDIA"
4949

5050
# The full version, including alpha/beta/rc tags
51-
release = "2.4.0"
52-
version = "2.4.0"
51+
release = "2.7.0"
52+
version = "2.7.0"
5353

5454
readthedocs_version_name = os.environ.get("READTHEDOCS_VERSION_NAME")
5555
build_version = readthedocs_version_name if readthedocs_version_name not in (None, "latest", "stable") else "main"

docs/flare_overview.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,11 @@ Adjacent to this central stack are tools facilitating experimentation and simula
112112
.. image:: resources/flare_overview.png
113113
:height: 500px
114114

115+
For detailed information on the architecture, please refer to the :ref:`flare_system_architecture` section.
116+
For detailed information on the security overview, please refer to the :ref:`flare_security_overview` section.
117+
118+
119+
115120
Design Principles
116121
=================
117122

0 commit comments

Comments
 (0)