fix readme

dimapihtar · dimapihtar · commit e17dc5bddea7 · 2025-11-18T11:26:11.000-08:00
Signed-off-by: dimapihtar &lt;dpihtar@gmail.com&gt;
diff --git a/megatron/training/datasets/README.md b/megatron/training/datasets/README.md
@@ -11,17 +11,11 @@ It probabilistically converts samples into FIM format using configurable rates,
 **Attributes**
 
 - `rate`: Probability of converting a sample into a FIM example.
-
 - `spm_rate`: Probability of using the SPM FIM pattern (vs PSM).
-
 - `extra_tokens`: Dictionary containing the FIM special tokens: {"prefix", "middle", "suffix", "pad", "eod"}.
-
 - `split_sample`: Optional token around which samples are split before applying FIM.
-
 - `fragment_rate`: Probability of applying FIM to each fragment when split_sample is used.
-
 - `no_prefix`: If the decoded sequence starts with this prefix, FIM is skipped.
-
 `GPTFIMDataset` dataset class that loads token sequences from an `IndexedDataset` and applies FIM transformations before returning each sample.
 
 **PSM Format**
@@ -37,5 +31,4 @@ It probabilistically converts samples into FIM format using configurable rates,
 **Special cases:**
 
 - If the sequence starts with no_prefix, FIM is skipped.
-
 - If FIM is not applied, the sample is returned unchanged.