Skip to content

Commit 5fd8602

Browse files
[RcclReplayer] JSON <-> BIN log format conversion tool (#2056)
* Add replay log format converter * Add Log Sanitizer * Add no timestamp option (nts) to sanitizer
1 parent db52690 commit 5fd8602

File tree

2 files changed

+1350
-0
lines changed

2 files changed

+1350
-0
lines changed

tools/RcclReplayer/README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,37 @@ Replayer is a separate tool which aims to re-run the same set of RCCL calls as r
8585
Each rank will print out its progress as it goes through every line of calls, including its rank, line number, RCCL API name, status (INFO/WARNING/ERROR).
8686
It will also report time and bandwidth (if the line is a communication call) for that call. In the end, it will report the total time taken by all communication calls.
8787
Replayer is still under development and experimentations, so the formats of logging or contents of replayer output will be subject to changes.
88+
89+
## Log Converter
90+
`replay_log_converter.py` is a utility to convert between binary and JSON log formats, standardize JSON logs for easier parsing, and sanitize logs for comparison.
91+
92+
**Usage:**
93+
* **Binary to JSON:** `python3 replay_log_converter.py <basename> tojson`
94+
* **JSON to Binary:** `python3 replay_log_converter.py <basename> tobin`
95+
* **Standardize JSON:** `python3 replay_log_converter.py <basename> --standardize`
96+
* **Sanitize JSON:** `python3 replay_log_converter.py <basename> --sanitize`
97+
98+
An optional output basename can be provided after the mode (tojson/tobin) to customize the output filename:
99+
* `python3 replay_log_converter.py <basename> <mode> <output_basename>`
100+
101+
The converter automatically finds all matching log files with pattern `basename.PID.hostname` and processes them.
102+
103+
**Output Files:**
104+
* Standardized JSON output is saved with `.standard.json` extension and can be parsed with standard JSON libraries.
105+
* Sanitized files are modified in-place (original files are overwritten with sanitized versions).
106+
107+
**Examples:**
108+
* `python3 replay_log_converter.py replayer_log tojson` produces `replayer_log.{1270-1278}.quanta-cx77-11.json`
109+
* `python3 replay_log_converter.py replayer_log tojson converted_log` produces `converted_log.{1270-1278}.quanta-cx77-11.json`
110+
* `python3 replay_log_converter.py replayer_log --sanitize` sanitizes existing JSON files in-place
111+
* `python3 replay_log_converter.py replayer_log tojson --sanitize` converts to JSON and sanitizes in one step
112+
* `python3 replay_log_converter.py replayer_log --sanitize --no-timestamp` (or `--nts`) sets all timestamps to 0.0
113+
114+
**Sanitization:**
115+
The `--sanitize` option normalizes logs for easier comparison by:
116+
* Remapping pointers to readable identifiers (e.g., `comm : 0x7fb680328010``comm : comm_001`)
117+
* Normalizing timestamps relative to the first call (e.g., `time : 1762969171532.248535``time : 0.000000`)
118+
* Use `--no-timestamp` (or `--nts`) to set all timestamps to 0.0 instead
119+
* Preserving relationships: same pointer values get the same sanitized identifier
120+
* Sanitized fields: communicators (`comm`), unique IDs (`uniqueID`), streams (`stream`), buffer addresses (`addr`/`base`/`ptr`/`acc`), handles (`handle`), thread IDs (`thread`), and process IDs (`pid`)
121+

0 commit comments

Comments
 (0)