Trying out logprobs and top logprobs for testing rather than logits. #745

Manan17 · 2025-06-05T00:29:07Z

Summary

Just testing out logprobs as mentioned in #742
It worked for the models where the test using logits was not working.
Also, tried to setup 1e-1 tolerance for qwen (previously 1) and it passed.

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Tcc0403 · 2025-06-05T01:05:06Z

test/convergence/bf16/test_mini_models.py

        assert_verbose_allclose(
-            expected_output["logits"],
-            actual_output["logits"],
+            expected_logprobs,
+            actual_logprobs,
            atol=logits_atol,
-            rtol=logits_rtol,
+            rtol=logits_rtol
        )


We don't have to compare all logprobs

Tcc0403 · 2025-06-05T01:05:35Z

test/convergence/bf16/test_mini_models.py

+        k = 5
+        exp_topk_vals, _ = torch.topk(expected_logprobs, k, dim=-1)
+        act_topk_vals, _ = torch.topk(actual_logprobs, k, dim=-1)

+        # Compare top-k logprobs with tolerance (ignoring order)
+        max_diff = torch.max(torch.abs(exp_topk_vals - act_topk_vals)).item()
+        print(f"Top-{k} logprobs max diff: {max_diff:.6f}")
+        assert torch.all(torch.abs(exp_topk_vals - act_topk_vals) < (logits_atol + logits_rtol * torch.abs(exp_topk_vals))), (
+            f"Top-{k} logprobs are not all close (atol={logits_atol}). "
+            f"Max diff: {max_diff:.6f}"
+        )


Make it a test util

Tcc0403 · 2025-06-05T01:45:48Z

test/convergence/bf16/test_mini_models.py

+            1e-1,  # 1e-1
            1e-1,  # 1e-2


After removing all logprobs comparison, we can try setting it lower.
sglang only has atol and sets it to 5e-2 (decode_tolerance)
verl sets (atol, rtol) = (1e-2, 1e-5), but it's mean of all logprobs not topk

Does not work with lower tolerance.
For gemma3, it passes when atol=1e-1 and rtol=1

I tested this out with fp32, it fails for most of the models where old logic for checking the logits is passing.

Since we are comparing values in log-space, the total tolerance here is actually relative tolerance.

Tcc0403 · 2025-06-07T00:48:07Z

test/convergence/bf16/test_mini_models.py

+        actual_logprobs = torch.nn.functional.log_softmax(actual_output["logits"], dim=-1)
+        expected_logprobs = torch.nn.functional.log_softmax(expected_output["logits"], dim=-1)


Make log_softmax() and topk() a util function, so we can call it in run_mini_model() to avoid storing all logits

Tcc0403 · 2025-06-07T01:05:26Z

test/convergence/bf16/test_mini_models.py

-
+        actual_logprobs = torch.nn.functional.log_softmax(actual_output["logits"], dim=-1)
+        expected_logprobs = torch.nn.functional.log_softmax(expected_output["logits"], dim=-1)
+        check_logprobs(actual_logprobs,expected_logprobs, atol=logits_atol,rtol=logits_rtol)


Assuming we have top logprobs calculated and stored, we only need to call
assert_verbose_allclose(actual_top_logprobs, expected_top_logprobs, atol=logprob_atol, rtol=logprob_rtol).

Manan17 added 2 commits June 3, 2025 21:07

Working with logprob for testing

370cd34

Checking top k prologs

465465b

Tcc0403 reviewed Jun 5, 2025

View reviewed changes

created a util function

12003ea

Tcc0403 reviewed Jun 7, 2025

View reviewed changes

Merge branch 'main' into logprobs

242bf55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trying out logprobs and top logprobs for testing rather than logits. #745

Trying out logprobs and top logprobs for testing rather than logits. #745

Uh oh!

Manan17 commented Jun 5, 2025 •

edited

Loading

Uh oh!

Tcc0403 Jun 5, 2025

Uh oh!

Tcc0403 Jun 5, 2025

Uh oh!

Tcc0403 Jun 5, 2025

Uh oh!

Manan17 Jun 5, 2025 •

edited

Loading

Uh oh!

Manan17 Jun 6, 2025

Uh oh!

Tcc0403 Jun 7, 2025

Uh oh!

Tcc0403 Jun 7, 2025 •

edited

Loading

Uh oh!

Tcc0403 Jun 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

		actual_logprobs = torch.nn.functional.log_softmax(actual_output["logits"], dim=-1)
		expected_logprobs = torch.nn.functional.log_softmax(expected_output["logits"], dim=-1)

Trying out logprobs and top logprobs for testing rather than logits. #745

Are you sure you want to change the base?

Trying out logprobs and top logprobs for testing rather than logits. #745

Uh oh!

Conversation

Manan17 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

Tcc0403 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Manan17 Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Manan17 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Manan17 commented Jun 5, 2025 •

edited

Loading

Manan17 Jun 5, 2025 •

edited

Loading

Tcc0403 Jun 7, 2025 •

edited

Loading

Tcc0403 Jun 7, 2025 •

edited

Loading