Skip to content

Releases: NVIDIA/NVFlare

2.7.0rc2: Feature enhancements and bug fixes

03 Sep 16:02
3d1de76

Choose a tag to compare

What's Changed

Full Changelog: 2.7.0rc1...2.7.0rc2

2.7.0rc1: Release candidate of 2.7.0

13 Aug 15:45
9c75f55

Choose a tag to compare

Pre-release

What's Changed

Read more

2.6.2: Updating Flower Integration and examples

06 Aug 16:14
8e83701

Choose a tag to compare

What's Changed

Full Changelog: 2.6.1...2.6.2

2.6.1: Bug fixes and feature enhancements

11 Jul 22:19
e60800e

Choose a tag to compare

What's Changed

Full Changelog: 2.6.0...2.6.1

2.6.0: Major release

22 Apr 18:07
29171be

Choose a tag to compare

Special thanks to all the contributors for this release (in git shortlog order):
67 @YuanTingHsieh(謝沅廷)
38 @yanchengnv
37 @ZiyueXu77
36 @SYangster
27 @chesterxgchen
25 @nvkevlu
24 @holgerroth
23 @nvidianz
22 @yhwen
14 @IsaacYangSLA
9 @zhijinl
4 @francescofarina
1 @agiusa
1 @NAEV95
1 @pxLi
1 @taleinat
1 @falibabaei

What is New ?

Message Quantization: Reducing Communication Overhead

One of the major bottlenecks in FL is the exchange of model updates among remote participants and servers. The size of these messages can be prohibitively large, leading to increased latency and bandwidth consumption. Furthermore, given that recent LLMs are trained with reduced precision, the default fp32 message precision can even artificially inflate the message size. Message quantization offers a solution by reducing the precision of transmitted updates, thereby compressing the message size.

We implemented quantization and dequantization with our filter mechanism: quantization will be performed over the outgoing model weights before transmission, and dequantization will recover the original precision upon receiving the message at the other end. There are two benefits of such an implementation: first, no code change will be needed from user side - the same training script can be used with and without message quantization with a simple config setting; second, both training and aggregation will be performed at original precision, rather than quantized data, such that the potential impact message quantization can have over the training process will be minimized.

We use direct cropping and casting to convert fp32 to fp16, and make use of bitsandbytes to perform 8- and 4-bit quantizations. With this new functionality, we support both numpy arrays (previous default), and torch Tensors directly for training LLMs.

message_quantization

Table 1 illustrates the message size in MB for a 1B parameter LLM under different precisions. You can find more details regarding training loss curve alignments in our LLM example.

Screenshot 2025-04-22 at 1 46 15 PM

By applying message quantization techniques, FL can achieve significant bandwidth savings, and for training LLM with Supervised Fine-Tuning (SFT) in our experiments. As shown in Figure 2, message quantization does not sacrifice model convergence quality with regard to the training loss.

quantization_loss
Figure 2. Federated SFT comparison: FL under message quantization.

Native Tensor Transfer

In 2.6, we further introduce support for native tensor transfer. This feature allows sending PyTorch tensors directly, reducing serialization and communication overhead. In the previous version, a Tensor would be transformed to Numpy for serialization and communication. In this release, the native tensors can be directly transferred. No tensor to numpy conversion is needed, and thus the original data format is preserved. Only PyTorch is supported for now.

Model Streaming Enhancements

Reduce Local Memory Usage

One critical challenge in FL is the memory overhead for sending and receiving the messages. Under the default setting, large memory needs to be allocated to hold the entire message. Such requirements can be affordable with decent system capabilities and moderate model size, but when considering a 70B or larger parameter model, it can quickly drain the available system memory.
In order to send the model in chunks, we need additional memory to prepare and receive model chunks, that requires the model size to be doubled, i.e. 70GB model requires 140 GB memory. To address this issue, we are introducing, …

  • Object container streaming: Processing and transmitting model incrementally, rather than requiring the entire dictionary of gradients to be stored in memory at once. Container streaming serializes one item of the parameter dictionary at a time. So for the above example of a 70 GB model with 1 GB item-max, the additional memory needed for sending the message is 70 GB if sending it as a whole, i.e. 70+70=140 GB. In contrast, object ContainerStreamer only needs 1GB additional memory: i.e. 70+1 = 71 GB

Support Unlimited Memory Streaming

Large model streaming is currently bound to the CPU/GPU memory size, i.e. the model must be able to fix into memory before we can stream to the remote server. What if the model size is bigger than the memory size ? In this release we also introduced the file-based streaming to address this concern

  • File Streaming: File streaming reads the file chunk-by-chunk and therefore only consumes the memory required to hold one chunk of the data. Thus, the additional memory needed by FileStreamer is independent of the model size / max item size, and only relies on the file i/o setting, which can be a very small memory overhead.

Screenshot 2025-04-22 at 1 56 42 PM

The table below illustrates the memory comparisons with a local simulation of one-time sending a 1B model. We record the system memory footprint and compare the peak memory usage of the three settings: regular, container streaming, and file streaming. We can observe that the memory usage is significantly reduced by using streaming, especially for file streaming. However, file streaming can take a longer time to finish the job due to file I/O efficiency.

Screenshot 2025-04-22 at 1 57 38 PM

Note: Streaming enhancements are not yet integrated into the high-level APIs or existing FL algorithm controllers/executors. However, users can build custom controllers or executors to leverage this feature. Full support will be included in a future release.

Structured Logging

Structured logging is a long standing request from customer, together with other logging related requests, we are trying to address customers concerns related to

  • Logging observability – can we format the log in json format, that can be used data observability tool
  • Can we make it easier for the data scientists to just focus on the training log rather than the communication logs?
  • Can we make dynamic changes to the log level for easy debugging in production ?
  • Can we change log level for the model and causing the call classes in the module and sub-modules to change log level instead of having to change each class individually

This feature addresses these concerns.

  • We change the python logging configuration from fileConfig to dictConfig
    The new FLARE Loggers are designed to follow the package level hierarchy using dot separated logger names in order to facilitate granular control at different levels.

  • We provide a Default Logging Configuration file log_config.json.default for all NVFLARE sub-systems with pre-configured handlers for console level colors, logs, error logs, structured json logs, and fl training logs.

  • We support Dynamic Logging Configuration Commands to allow dynamic change logging configuration without restarting the FL system.

  • To support various needs and backward compatibility, we now have the following default log files

    • log.txt – default log file as previous version
    • log.json – json format log
    • Log_error.txt – “ERROR” level logs to log_error.txt for quick error lookup
    • log_fl.txt – This log removes the system and communication related logs and clearly shows logs related to FL tasks ( such as training)
  • Consider many researchers will mostly using Simulator for quick experiments, we further defined a few predefined logging modes for simulator

    • log config mode ('concise', 'full', 'verbose'), default to concise
      • Concise – only shows the FL tasks logs
      • Full – default to previous logging configuration as previous release
      • Verbose – debug level logging
        For details, please refer to logging tutorials and logging documentation

Federated Statistics Extension

Quantiles Support: Expands statistical capabilities by introducing quantile computation for federated statistics.
Quantile statistics refers to statistical measures that divide a probability distribution or dataset into intervals with equal probabilities or proportions. Quantiles help summarize the distribution of data by providing key points that indicate how values are spread.
Please refer Federated Statistics for tabular data example

System Monitoring

FLARE Monitoring provides an initial solution for tracking system metrics of your federated learning jobs. Different from Machine learning experiment tracking, where it focused on the training metrics, the moni...

Read more

2.6.0rc5: Minor bug fixes

19 Apr 00:09
75517c8

Choose a tag to compare

Pre-release

What's Changed

Full Changelog: 2.6.0rc4...2.6.0rc5

2.6.0rc4: Minor bug fixes and documentation updates

17 Apr 01:10
29bf5ba

Choose a tag to compare

What's Changed

Full Changelog: 2.6.0rc3...2.6.0rc4

2.6.0rc3

16 Apr 00:48
29a14b6

Choose a tag to compare

2.6.0rc3 Pre-release
Pre-release

What's Changed

Full Changelog: 2.6.0rc2...2.6.0rc3

2.6.0rc2: Enhancements and bug fixes

27 Mar 02:46
ca33ad9

Choose a tag to compare

Pre-release

What's Changed

Full Changelog: 2.6.0rc1...2.6.0rc2

2.6.0rc1: Release candidate 1 of nvflare 2.6.0

11 Mar 02:46
e4da46b

Choose a tag to compare

What's Changed

Read more