We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c1ab42b commit f60f9c8Copy full SHA for f60f9c8
megatron/post_training/docs/distillation.md
@@ -75,7 +75,7 @@ Model Optimizer modifies the model using the loss criterion present in the disti
75
defines a loss function between two module attribute names of the teacher and student model, respectively.
76
77
Default loss function used between logits is a KL-Divergence Loss and loss used among intermediate tensors is Cosine-Similarity,
78
-both defined in `megatron/inference/algos/distillation.py`.
+both defined in `modelopt.torch.distill.plugins.megatron`.
79
80
## Restrictions
81
0 commit comments