Skip to content

Commit 593de54

Browse files
committed
NCCL 2.27.7-1
Prevent initialization failures in certain configurations when attempting to load fp8-specific symmetric multicast kernels on GPUs older than Blackwell.
1 parent 0d1ece2 commit 593de54

File tree

5 files changed

+382
-7
lines changed

5 files changed

+382
-7
lines changed

ext-tuner/README.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# NCCL Tuner Plugin Development
2+
3+
This directory contains resources and examples for developing NCCL tuner plugins. Tuner plugins allow you to customize NCCL's algorithm and protocol selection behavior to optimize performance for specific workloads and hardware configurations.
4+
5+
## Overview
6+
7+
NCCL tuner plugins provide a way to influence NCCL's automatic algorithm and protocol selection by modifying the cost tables that NCCL uses to make decisions. This allows you to:
8+
9+
- Override default algorithm/protocol combinations for specific collective operations
10+
- Customize tuning based on message size, topology, and other parameters
11+
- Implement sophisticated tuning strategies without recompiling NCCL
12+
- Optimize performance for specific hardware configurations or workloads
13+
14+
## Tuner Plugin Interface
15+
16+
NCCL tuner plugins must implement the `ncclTuner_t` interface defined in `nccl_tuner.h` within `nccl/src/include/plugin`. These definitions have been forked to `tuner.h` in each example plugin, and it is expected that any plugin implementor forks the internal NCCL definitions as well. The current interface includes:
17+
18+
```c
19+
// Initialize the tuner plugin
20+
ncclResult_t (*init)(size_t nRanks, size_t nNodes, ncclDebugLogger_t logFunction, void **context);
21+
22+
// Get and modify collective operation cost information
23+
ncclResult_t (*getCollInfo)(void* context, ncclFunc_t collType, size_t nBytes,
24+
int numPipeOps, float** collCostTable, int numAlgo, int numProto,
25+
int regBuff, int* nChannels);
26+
27+
// Clean up plugin resources
28+
ncclResult_t (*destroy)(void* context);
29+
```
30+
31+
## Development Guidelines
32+
33+
### 1. Plugin Structure
34+
35+
A typical tuner plugin should:
36+
- Include the necessary forked NCCL headers (`tuner.h`)
37+
- Implement all required interface functions
38+
- Export the plugin structure with appropriate version
39+
- Handle all input parameters gracefully
40+
41+
### 2. Cost Table Modification
42+
43+
The `getCollInfo` function receives a cost table that maps algorithm/protocol combinations to performance costs. Lower costs indicate preferred combinations. You can:
44+
45+
- Set costs to `0.0` to make combinations highly preferred
46+
- Set costs to `NCCL_ALGO_PROTO_IGNORE` to disable combinations
47+
- Use relative costs to create preferences between options
48+
49+
### 3. Channel Management
50+
51+
The `nChannels` parameter allows you to:
52+
- Set a specific number of channels to use
53+
- Return the original value to preserve NCCL's default behavior
54+
- Implement dynamic channel selection based on message size or topology
55+
56+
### 4. Error Handling
57+
58+
Always return appropriate `ncclResult_t` values:
59+
- `ncclSuccess` for successful or ignored operations
60+
- `ncclInternalError` for plugin-specific errors. Returning an error is only advisable on plugin initialization and destruction, as the penalty users can pay for the overhead of a failed plugin call can be immense.
61+
- Other NCCL error codes as appropriate
62+
63+
## Getting Started
64+
65+
### Option 1: Start with the Example Plugin
66+
67+
If you're new to tuner plugin development, start with the `example/` directory:
68+
69+
```bash
70+
cd example/
71+
make
72+
```
73+
74+
This provides a CSV-based configuration system that you can customize or use as a template.
75+
76+
## Building and Testing
77+
78+
### Build Requirements
79+
80+
- GCC or compatible C compiler
81+
- NCCL headers (included in `nccl/` subdirectories)
82+
- Make
83+
84+
## Option 2: Use the Basic Plugin
85+
86+
For more customized tuning needs, you might want to start with a clean baseline. In that case, base off the basic plugin in the `basic/` directory:
87+
88+
```bash
89+
cd basic/
90+
make
91+
```
92+
93+
### Build Process
94+
95+
Each plugin directory contains a Makefile:
96+
97+
```bash
98+
cd basic/ # or example/
99+
make
100+
```
101+
102+
This generates a shared library (`.so` file) that can be loaded by NCCL.
103+
104+
### Loading the Plugin
105+
106+
Set the `LD_LIBRARY_PATH` to include your plugin directory:
107+
108+
```bash
109+
export LD_LIBRARY_PATH=/path/to/your/plugin:$LD_LIBRARY_PATH
110+
```
111+
112+
Set `NCCL_TUNER_PLUGIN` to either the plugin name, or the absolute path to the plugin file. Any of the below can work:
113+
114+
```bash
115+
export NCCL_TUNER_PLUGIN=example
116+
export NCCL_TUNER_PLUGIN=libnccl-tuner-example.so
117+
export NCCL_TUNER_PLUGIN=/path/to/your/plugin/libnccl-tuner-example.so
118+
```
119+
120+
NCCL will automatically discover and load the plugin based on the exported symbol names.
121+
122+
## Advanced Topics
123+
124+
### Plugin Versioning
125+
126+
NCCL supports multiple plugin interface versions. Make sure your plugin exports the correct version:
127+
128+
```c
129+
const ncclTuner_v4_t ncclTunerPlugin_v4 = {
130+
.name = "YourPluginName",
131+
.init = yourInitFunction,
132+
.getCollInfo = yourGetCollInfoFunction,
133+
.destroy = yourDestroyFunction
134+
};
135+
```
136+
137+
### Multi-GPU and Multi-Node Considerations
138+
139+
Your plugin receives topology information (`nRanks`, `nNodes`) during initialization. Use this to:
140+
- Implement topology-aware tuning strategies
141+
- Handle single-node vs. multi-node optimizations differently
142+
- Scale channel counts based on available hardware
143+
144+
### Performance Optimization
145+
146+
- Keep plugin logic lightweight to avoid impacting NCCL performance
147+
- Cache expensive computations when possible
148+
- Use the logging system for debugging but avoid excessive output in production
149+
150+
## Debugging and Logging
151+
152+
Use NCCL's debug logging system:
153+
154+
```bash
155+
export NCCL_DEBUG=INFO # General information
156+
export NCCL_DEBUG_SUBSYS=TUNING
157+
```
158+
159+
Within your plugin, use the provided `ncclDebugLogger_t` function for consistent logging.
160+
161+
## Best Practices
162+
163+
1. **Test thoroughly**: Verify your plugin works with various message sizes and topologies
164+
2. **Handle edge cases**: Ensure your plugin behaves correctly with unusual input parameters
165+
3. **Document your approach**: Clearly document your tuning strategy and configuration options
166+
4. **Version your plugin**: Use meaningful version numbers and maintain backward compatibility
167+
5. **Performance validation**: Measure the impact of your tuning decisions on real workloads
168+
169+
## Contributing
170+
171+
When developing new tuner plugins:
172+
- Follow the existing code style and structure
173+
- Include comprehensive documentation
174+
- Add example configurations and test cases
175+
- Consider contributing useful plugins back to the community
176+
177+
## Resources
178+
179+
- [NCCL Documentation](https://docs.nvidia.com/deeplearning/nccl/)
180+
- Example plugin implementations in this directory
181+
182+
For questions and support, refer to the NCCL community resources and documentation.

ext-tuner/basic/README.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Basic NCCL Tuner Plugin
2+
3+
This directory contains a minimal placeholder implementation of an NCCL tuner plugin. It serves as a starting point for developing custom tuner plugins by providing the essential function stubs and interface structure required by NCCL.
4+
5+
## Purpose
6+
7+
This basic plugin is designed to:
8+
- Provide a minimal working example of the NCCL tuner plugin interface
9+
- Serve as a template for developing custom tuner plugins
10+
- Demonstrate the required function signatures and structure
11+
- Implement placeholder functionality that can be extended
12+
13+
14+
## Implementation Details
15+
16+
The plugin implements the following functions:
17+
18+
### `pluginInit`
19+
```c
20+
ncclResult_t pluginInit(size_t nRanks, size_t nNodes, ncclDebugLogger_t logFunction, void **context)
21+
```
22+
- **Purpose**: Initialize the plugin with communicator information
23+
- **Current Implementation**: Simple placeholder that returns success
24+
- **Parameters**:
25+
- `nRanks`: Total number of ranks in the communicator
26+
- `nNodes`: Total number of nodes in the communicator
27+
- `logFunction`: NCCL debug logging function
28+
- `context`: Plugin context pointer (output)
29+
30+
### `pluginGetCollInfo`
31+
```c
32+
ncclResult_t pluginGetCollInfo(void* context, ncclFunc_t collType, size_t nBytes,
33+
int numPipeOps, float** collCostTable, int numAlgo, int numProto,
34+
int regBuff, int* nChannels)
35+
```
36+
- **Purpose**: Modify cost tables for collective operations
37+
- **Current Implementation**:
38+
- Sets RING+SIMPLE algorithm to cost 0.0 (highest preference)
39+
- Sets channel count to 1
40+
- **Parameters**:
41+
- `context`: Plugin context from init
42+
- `collType`: Type of collective operation
43+
- `nBytes`: Message size in bytes
44+
- `numPipeOps`: Number of pipeline operations
45+
- `collCostTable`: Cost table to modify
46+
- `numAlgo`: Number of algorithms
47+
- `numProto`: Number of protocols
48+
- `regBuff`: Whether buffer can be registered
49+
- `nChannels`: Number of channels to use (output)
50+
51+
### `pluginDestroy`
52+
```c
53+
ncclResult_t pluginDestroy(void* context)
54+
```
55+
- **Purpose**: Clean up plugin resources
56+
- **Current Implementation**: Simple placeholder that returns success
57+
58+
## Cost Table Structure
59+
60+
The plugin demonstrates how to modify NCCL's cost tables:
61+
62+
```c
63+
float (*table)[NCCL_NUM_PROTOCOLS] = (float (*)[NCCL_NUM_PROTOCOLS])collCostTable;
64+
```
65+
66+
The cost table is a 2D array where:
67+
- First dimension: Algorithm index (e.g., `NCCL_ALGO_RING`)
68+
- Second dimension: Protocol index (e.g., `NCCL_PROTO_SIMPLE`)
69+
- Values: Cost for that algorithm/protocol combination
70+
71+
### Cost Values
72+
- **0.0**: Highest preference (lowest cost)
73+
- **Positive values**: Relative costs (lower is better)
74+
- **`NCCL_ALGO_PROTO_IGNORE`**: Disable this combination
75+
76+
## Building
77+
78+
```bash
79+
make
80+
```
81+
82+
This creates `libnccl-tuner-basic.so` which can be loaded by NCCL.
83+
84+
## Usage
85+
86+
### Loading the Plugin
87+
88+
```bash
89+
export LD_LIBRARY_PATH=/path/to/basic:$LD_LIBRARY_PATH
90+
mpirun -np 4 your_nccl_application
91+
```
92+
93+
```bash
94+
export NCCL_TUNER_PLUGIN=basic
95+
export NCCL_TUNER_PLUGIN=libnccl-tuner-basic.so
96+
export NCCL_TUNER_PLUGIN=/path/to/your/plugin/libnccl-tuner-basic.so
97+
```
98+
99+
### Verifying Plugin Loading
100+
101+
Enable NCCL debug output to see if the plugin is loaded:
102+
103+
```bash
104+
export NCCL_DEBUG=INFO
105+
```
106+
107+
You should see messages indicating the tuner plugin is being used.
108+
109+
## Extending the Plugin
110+
111+
This basic plugin provides a foundation that you can extend:
112+
113+
### 1. Add Configuration Logic
114+
115+
Modify `pluginGetCollInfo` to implement your tuning strategy:
116+
117+
```c
118+
__hidden ncclResult_t pluginGetCollInfo(void* context, ncclFunc_t collType, size_t nBytes,
119+
int numPipeOps, float** collCostTable, int numAlgo, int numProto,
120+
int regBuff, int* nChannels) {
121+
// Your custom tuning logic here
122+
if (nBytes < 1024) {
123+
// Small message optimization
124+
table[NCCL_ALGO_TREE][NCCL_PROTO_SIMPLE] = 0.0;
125+
} else {
126+
// Large message optimization
127+
table[NCCL_ALGO_RING][NCCL_PROTO_LL128] = 0.0;
128+
}
129+
130+
// Dynamic channel selection
131+
*nChannels = (nBytes > 1024*1024) ? 4 : 1;
132+
133+
return ncclSuccess;
134+
}
135+
```
136+
137+
### 2. Add Context Management
138+
139+
Use the context pointer to store plugin state:
140+
141+
```c
142+
struct pluginContext {
143+
int initialized;
144+
size_t nRanks;
145+
size_t nNodes;
146+
// Add your plugin-specific data here
147+
};
148+
```
149+
150+
### 3. Add File-Based Configuration
151+
152+
Read configuration from files, environment variables, or other sources.
153+
154+
### 4. Add Topology Awareness
155+
156+
Use the `nRanks` and `nNodes` parameters to implement topology-specific tuning.
157+
158+
## File Structure
159+
160+
```
161+
basic/
162+
├── README.md # This file
163+
├── plugin.c # Plugin implementation
164+
├── Makefile # Build configuration
165+
└── nccl/ # NCCL header files
166+
└── tuner.h # Tuner plugin interface definitions
167+
```
168+
169+
## Next Steps
170+
171+
1. **Understand the Interface**: Study the function signatures and parameters
172+
2. **Implement Your Logic**: Add your tuning strategy to `pluginGetCollInfo`
173+
3. **Test Thoroughly**: Verify your plugin works with different message sizes and topologies
174+
4. **Add Error Handling**: Implement proper error checking and resource management
175+
5. **Document Your Changes**: Update this README with your specific implementation details
176+
177+
## Comparison with Example Plugin
178+
179+
- **Basic Plugin**: Minimal implementation, good for learning and simple use cases
180+
- **Example Plugin**: Full-featured CSV-based configuration system, good for production use
181+
182+
Choose the basic plugin if you want to:
183+
- Learn the tuner plugin interface
184+
- Implement simple, hardcoded tuning strategies
185+
- Build a custom plugin from scratch
186+
187+
Choose the example plugin if you want:
188+
- File-based configuration
189+
- Complex tuning strategies
190+
- Production-ready features
191+
192+
## Resources
193+
194+
- [Parent Directory README](../README.md) - General tuner plugin development guide
195+
- [Example Plugin](../example/README.md) - Fully featured implementation
196+
197+
This basic plugin provides the foundation you need to start developing custom NCCL tuner plugins. Extend it with your specific tuning logic and requirements.

ext-tuner/example/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,6 @@ Set the `NCCL_TUNER_CONFIG_FILE` environment variable to specify the config file
104104

105105
```bash
106106
export NCCL_TUNER_CONFIG_FILE=/path/to/your/tuner.conf
107-
export LD_LIBRARY_PATH=/path/to/plugin:$LD_LIBRARY_PATH
108107
mpirun -np 4 your_nccl_application
109108
```
110109

@@ -158,7 +157,7 @@ When channels is set to `-1`, NCCL's default channel selection logic is preserve
158157

159158
1. **Config file not found**: Check the file path and permissions
160159
2. **Configurations not applied**: Verify the collective type, size ranges, algorithm/protocol names, and topology parameters
161-
3. **Plugin not loaded**: Ensure `LD_LIBRARY_PATH` includes the plugin directory
160+
3. **Plugin not loaded**: Ensure `LD_LIBRARY_PATH` includes the plugin directory and that `NCCL_TUNER_PLUGIN` either specifies the plugin name, or an absolute path to the plugin shared library.
162161
4. **No effect on performance**: Check that NCCL is actually using the tuner plugin with `NCCL_DEBUG=INFO`
163162
5. **Topology mismatch**: Verify that nNodes and nRanks match your actual setup, or use -1 for wildcards
164163
6. **CSV parsing errors**: Ensure no spaces after commas, or quote fields containing spaces

0 commit comments

Comments
 (0)