Skip to content

Commit 79d11b9

Browse files
committed
issue: 4480494 Add project-specific guidelines
Introduce three cursor rule files to standardize development practices across the libxlio codebase: 1. coding.mdc - C++ coding standards tailored for performance-critical networking code. Includes naming conventions (m_, s_, g_ prefixes), nuanced auto keyword guidance, pointer placement rules, and performance-critical programming patterns. Adapted from existing docs/coding-style.md and contrib/jenkins_tests/style.conf. 2. testing.mdc - Google Test guidelines for libxlio's test suites. Documents test naming (ti_N/tu_N), Doxygen format requirements, fork-based testing patterns, thread safety testing, and LD_PRELOAD testing requirements. Critical for maintaining test consistency across integration and unit tests. 3. architecture.mdc - High-level project architecture and domain knowledge. Covers socket interception layer, memory management patterns, hardware abstraction, and key terminology (Ring, CQ, QP, RFS, etc.). Provides essential context for developers new to RDMA and InfiniBand concepts. Each rule uses glob patterns to apply only to relevant files, reducing overhead and improving specificity. The coding rules emphasize performance considerations appropriate for a kernel-bypass networking library, including raw pointer usage in hot paths, custom memory pools, and cache alignment strategies. Testing rules document libxlio-specific patterns like the critical exit(testing::Test::HasFailure()) requirement in forked child processes to prevent test duplication, and the dual OS/XLIO testing modes via LD_PRELOAD. Architecture rules consolidate tribal knowledge about execution modes (R2C vs Worker Threads), buffer pool management, and InfiniBand hardware integration, making it easier for new contributors to understand the codebase's unique characteristics. Signed-off-by: Tomer Cabouly <[email protected]>
1 parent 8eeccc7 commit 79d11b9

File tree

3 files changed

+1210
-0
lines changed

3 files changed

+1210
-0
lines changed

.cursor/rules/architecture.mdc

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
---
2+
globs: **/*.cpp,**/*.h,**/*.cc,**/*.c
3+
alwaysApply: false
4+
---
5+
# libxlio Architecture Guide
6+
7+
Accelerated IO SW library (XLIO) - High-performance network acceleration library
8+
9+
## Project Overview
10+
11+
libxlio is a high-performance network acceleration library that provides socket API acceleration through InfiniBand/RDMA hardware. The library intercepts standard socket calls and redirects them to accelerated network hardware for improved performance.
12+
13+
**Key Features:**
14+
- Kernel-bypass architecture for high bandwidth and low CPU usage
15+
- Hardware-based direct copy between application memory and network interface
16+
- Support for TLS encryption/decryption with Crypto Enabled NVIDIA ConnectX
17+
- Hardware features: LRO/TSO, Striding-RQ for increased TCP performance
18+
- Compatible with both POSIX socket and XLIO Ultra APIs
19+
- No application code changes required
20+
21+
**Technology Stack:**
22+
- **Language**: C++11 with C compatibility layer
23+
- **Build System**: Autotools (autoconf/automake) with libtool
24+
- **Key Dependencies**: OFED, InfiniBand verbs, DPCP, JSON-C, Netlink libraries
25+
- **Testing**: Google Test framework, unit tests, performance tests
26+
- **Supported Platforms**: x86_64, ARM
27+
- **Supported Transports**: IPv4/6, TCP, UDP
28+
29+
## Core Architecture
30+
31+
### Socket Interception Layer
32+
- **LD_PRELOAD mechanism**: Library intercepts standard socket calls transparently
33+
- **Socket abstraction hierarchy**: `sockinfo` → `sockinfo_tcp`/`sockinfo_udp`/`sockinfo_ulp`
34+
- **File descriptor management**: `fd_collection` manages socket lifecycle
35+
- **Event-driven I/O**: All operations are event-driven through polling groups
36+
37+
### Memory Management Patterns
38+
- **Buffer pools**: Pre-allocated memory pools for zero-copy operations
39+
- **Ring buffers**: Circular buffers for data transmission (simple, bonded, slave types)
40+
- **Huge page optimization**: Automatic huge page allocation for better TLB performance
41+
- **Zero-copy operations**: Direct memory access between application and network interface
42+
43+
### Hardware Abstraction Layer
44+
- **InfiniBand verbs**: Low-level RDMA operations through `ib_ctx_handler`
45+
- **Device management**: `net_device_entry` and `net_device_table_mgr` for hardware abstraction
46+
- **Queue management**: CQ (Completion Queue) and QP (Queue Pair) management
47+
- **Hardware offloading**: TSO, LRO, GRO, RFS for performance optimization
48+
49+
### Data Flow
50+
1. Application makes standard socket call (socket, bind, connect, send, recv, etc.)
51+
2. XLIO intercepts via LD_PRELOAD and redirects to accelerated path
52+
3. Hardware acceleration through InfiniBand/RDMA with zero-copy operations
53+
4. Direct memory transfer between application buffers and network interface
54+
5. Event-driven completion handling through polling groups
55+
56+
### Execution Modes
57+
- **R2C (Run-to-Completion)**: Single-threaded processing for maximum performance
58+
- **Worker Threads**: Multi-threaded processing with dedicated worker threads
59+
- **Polling Groups**: Event management and callback registration for concurrent operations
60+
61+
## File Organization
62+
63+
```
64+
src/
65+
├── core/ # Core library functionality
66+
│ ├── config/ # Configuration management (JSON schema, registry)
67+
│ ├── dev/ # Device and hardware abstraction (rings, CQs, buffers)
68+
│ ├── event/ # Event handling and threading (poll groups, workers)
69+
│ ├── ib/ # InfiniBand specific code (verbs, MLX5)
70+
│ ├── iomux/ # I/O multiplexing (select, poll, epoll)
71+
│ ├── lwip/ # Lightweight IP stack (TCP implementation)
72+
│ ├── netlink/ # Netlink communication (routing, neighbors)
73+
│ ├── proto/ # Protocol handling (TCP, UDP, routing, ARP)
74+
│ ├── sock/ # Socket abstraction layer (sockinfo classes)
75+
│ └── util/ # Utility functions (memory, sys vars, instrumentation)
76+
├── stats/ # Statistics and monitoring
77+
├── state_machine/ # State machine implementations
78+
├── utils/ # Common utilities
79+
└── vlogger/ # Logging system
80+
81+
tests/
82+
├── unit_tests/ # Unit tests using Google Test
83+
├── gtest/ # Integration tests
84+
└── extra_api/ # Extra API tests
85+
86+
docs/
87+
├── configuration.md # Build configuration options
88+
├── coding-style.md # Code style guidelines
89+
└── contributing.md # Contribution guidelines
90+
```
91+
92+
## Key Terminology
93+
94+
### Core Components
95+
- **Ring**: Circular buffer for data transmission, can be simple, bonded, or slave
96+
- **CQ (Completion Queue)**: Hardware queue for completion notifications
97+
- **QP (Queue Pair)**: InfiniBand communication endpoint
98+
- **RFS (Receive Flow Steering)**: Hardware-based packet steering
99+
- **GRO (Generic Receive Offload)**: Hardware packet aggregation
100+
- **TSO (TCP Segmentation Offload)**: Hardware TCP segmentation
101+
- **LRO (Large Receive Offload)**: Hardware packet reassembly
102+
103+
### Execution Modes
104+
- **R2C (Run-to-Completion)**: Single-threaded processing
105+
- **Worker Threads**: Multi-threaded processing with dedicated worker threads
106+
107+
### Memory Management
108+
- **Buffer Pools**: Pre-allocated memory pools for network buffers
109+
- **Huge Pages**: Large memory pages for better TLB performance
110+
- **Zero-Copy**: Direct memory access without copying
111+
112+
### Network Stack
113+
- **lwIP**: Lightweight IP stack for TCP implementation
114+
- **Netlink**: Kernel-user space communication for routing/neighbors
115+
- **ARP**: Address Resolution Protocol implementation
116+
- **Routing**: Route table management and packet forwarding
117+
118+
### Hardware Abstraction
119+
- **InfiniBand Verbs**: Low-level InfiniBand operations
120+
- **MLX5**: Mellanox ConnectX-5/6 specific optimizations
121+
- **DPCP**: Direct Packet Control Plane for advanced features
122+
- **OFED**: OpenFabrics Enterprise Distribution for InfiniBand/RDMA support
123+
124+
## Common Code Patterns
125+
126+
### Socket Implementation Pattern
127+
```cpp
128+
class sockinfo_new_protocol : public sockinfo {
129+
public:
130+
sockinfo_new_protocol(int fd, int domain);
131+
132+
// Override required virtual methods
133+
virtual int bind(const struct sockaddr *addr, socklen_t addrlen) override;
134+
virtual int connect(const struct sockaddr *addr, socklen_t addrlen) override;
135+
virtual ssize_t send(const void *buf, size_t len, int flags) override;
136+
virtual ssize_t recv(void *buf, size_t len, int flags) override;
137+
138+
// Protocol-specific methods
139+
void handle_protocol_specific_event();
140+
};
141+
```
142+
143+
### Event Handler Pattern
144+
```cpp
145+
class new_event_handler : public event_handler {
146+
public:
147+
virtual void handle_event() override;
148+
virtual void handle_timer_expired(void *user_data) override;
149+
150+
// Register with poll group
151+
void register_with_poll_group(poll_group *pg);
152+
};
153+
```
154+
155+
### Configuration Access Pattern
156+
```cpp
157+
// Access configuration through registry
158+
config_registry &registry = config_registry::get_instance();
159+
160+
// Type-safe parameter access with validation
161+
auto memory_limit = registry.get_parameter<int>("core.resources.memory_limit");
162+
auto tcp_nodelay = registry.get_parameter<bool>("network.protocols.tcp.nodelay.enable");
163+
```
164+
165+
## Key Data Structures
166+
167+
```cpp
168+
class sockinfo {
169+
// Base socket information class
170+
// Inherited by sockinfo_tcp, sockinfo_udp, sockinfo_ulp
171+
};
172+
173+
class net_device_entry {
174+
// Network device abstraction
175+
// Handles InfiniBand context and queue pairs
176+
// Inherits from event_handler_ibverbs and timer_handler
177+
};
178+
179+
class ring {
180+
// Ring buffer for data transmission
181+
// Types: ring_simple, ring_bond, ring_slave
182+
};
183+
184+
class buffer_pool {
185+
// Memory pool for network buffers
186+
// Manages allocation and deallocation
187+
};
188+
189+
class poll_group {
190+
// Manages I/O multiplexing and event handling
191+
// Handles epoll, select, poll operations
192+
};
193+
194+
class worker_thread {
195+
// Worker thread for processing events
196+
// Part of worker_thread_manager
197+
};
198+
```
199+
200+
## Configuration System
201+
202+
### Architecture
203+
- **Registry**: `config_registry` class manages all configuration parameters
204+
- **Schema**: JSON Schema validation in `xlio_config_schema.json`
205+
- **Loaders**: JSON file, inline config, environment variables (deprecated)
206+
- **Priority**: JSON file → Inline config → Environment variables → Defaults
207+
208+
### Key Configuration Sections
209+
- **core**: Essential configuration (memory, resources, initialization)
210+
- **network**: Protocol-specific settings (TCP, UDP, routing)
211+
- **hardware**: Hardware-specific configurations (InfiniBand, rings, offloads)
212+
- **performance**: Performance tuning parameters (polling, batching, allocation)
213+
- **monitor**: Logging, statistics, and monitoring settings
214+
215+
## Quick Reference
216+
217+
### Essential Commands
218+
```bash
219+
# Build and install
220+
./autogen.sh && ./configure --with-dpcp --enable-utls && make -j && make install
221+
222+
# Run with XLIO
223+
LD_PRELOAD=libxlio.so ./your_application
224+
225+
# Monitor performance
226+
xlio_stats --all
227+
228+
# Check hardware
229+
ibstat && ibv_devinfo -l
230+
```
231+
232+
### Key Files
233+
- `src/core/sock/sockinfo.h` - Socket abstraction base class
234+
- `src/core/util/sys_vars.h` - Configuration parameters
235+
- `src/core/config/` - Configuration system
236+
- `src/core/dev/ring.h` - Ring buffer abstraction
237+
- `src/core/event/poll_group.h` - Event management
238+
- `tests/unit_tests/` - Unit tests
239+
- `contrib/jenkins_tests/style.conf` - Code formatting rules
240+
241+
### Key Configuration Parameters
242+
- **Memory**: `core.resources.memory_limit`, `core.resources.hugepages.enable`
243+
- **Performance**: `performance.polling.*`, `performance.ring_allocation_logic`
244+
- **Logging**: `monitor.log.level`, `monitor.log.file`
245+
- **Network**: `network.protocols.tcp.nodelay`, `network.protocols.udp.*`
246+
- **Hardware**: `hardware.ib.*`, `hardware.ring.*`
247+
248+
### Development Tools
249+
- **xlio_stats**: Runtime statistics and performance monitoring
250+
- **ethtool**: Hardware capabilities verification
251+
- **ibstat/ibv_devinfo**: InfiniBand device information
252+
- **valgrind**: Memory debugging (configure with `--with-valgrind`)
253+
- **gdb**: Debugging (enable with `--enable-debug`)
254+
255+
## Performance Considerations
256+
257+
### Key Optimization Areas
258+
- Minimize memory allocations in data path
259+
- Use zero-copy techniques where possible
260+
- Leverage hardware offloading (TSO, LRO, GRO)
261+
- Consider NUMA topology for multi-socket systems
262+
- Use huge pages for better TLB performance
263+
- Choose appropriate ring allocation strategy (per-core, per-interface, per-IP)
264+
- Tune polling intervals and batch sizes
265+
266+
### Common Bottlenecks
267+
- Insufficient buffer pool sizes
268+
- Suboptimal ring allocation strategy
269+
- Disabled hardware offloading features
270+
- Incorrect polling parameters
271+
- Memory allocation in hot paths

0 commit comments

Comments
 (0)