Chunk Across Batch and Context length for logprob calculations for grpo #357

pluesclues · 2025-11-21T21:19:12Z

Introduced chunked_hidden_states_selective_log_softmax for memory efficiency and updated related functions to utilize it. Removed deprecated sampling parameter updates and adjusted logit handling in grpo_compute_loss.

Comment out sections of code related to importance sampling and logit processing.

Added a new function 'grpo_update_SamplingParams' to update sampling parameters based on provided arguments. Refactored logit processing to handle chunked inputs and improved clarity in the computation of log probabilities.

Refactor input handling for pixel values and image grid, adding pre-calculation of padding and chunking logic.

Added max_left_pad parameter handling and adjusted related logic for padding in the model's input processing.

Removed unnecessary breakpoints and cleaned up comments.

gemini-code-assist · 2025-11-21T21:19:49Z

Summary of Changes

Hello @pluesclues, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant architectural changes to how log probabilities are calculated within the GRPO framework, primarily focusing on memory optimization. By implementing a chunked approach for logprob computation and refactoring the GRPO loss function to accept these pre-computed values, the system can now handle larger batch sizes or longer sequence lengths more efficiently. This change centralizes logit manipulation parameters and improves the overall robustness of the GRPO training process.

Highlights

Memory-Efficient Logprob Calculation: Introduced a new function, chunked_hidden_states_selective_log_softmax, to compute log probabilities in a memory-efficient manner by processing hidden states in chunks along the batch and sequence length dimensions. This is crucial for handling large models or long sequences.
Refactored GRPO Loss Function: The grpo_compute_loss function has been updated to directly receive pre-calculated log probabilities, streamlining its interface and removing redundant logit processing steps.
Batch-wise Logprob Generation: The grpo_accumulated_loss function now processes inputs by iterating through individual samples (batch size of 1) to generate log probabilities using the new chunked function, improving memory management during the forward pass.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the GRPO loss calculation to be more memory-efficient by chunking log-probability computations. A new function chunked_hidden_states_selective_log_softmax is introduced for this purpose, and the loss calculation is split across the batch dimension.

While the approach is sound, I've found a critical bug in the implementation: grpo_compute_loss is called with a mix of log-probabilities and hidden states, which will lead to incorrect loss values. I've also identified several other issues related to code clarity, duplication, and potential bugs in handling keyword arguments. Please see my detailed comments for suggestions on how to address these points.

gemini-code-assist · 2025-11-21T21:21:59Z

unsloth_zoo/rl_replacements.py

+    prev_max_left_pad    = kwargs.get("max_left_pad", None)
+
    #delete this from kwargs so less issues 
    sampling_per_token_logps = kwargs.pop("sampling_per_token_logps", None)


This line unconditionally overwrites the sampling_per_token_logps variable set on line 548, which makes the conditional logic on that line ineffective. This is likely a bug. This line should be removed, and line 548 should be modified to use .pop() to correctly handle the value based on the vllm_importance_sampling_correction flag.

I just changed the pop kwargs to be equal to _ since this would be a bug, I do not fully recall why that was there. will investigate further into that.

gemini-code-assist · 2025-11-21T21:21:59Z

unsloth_zoo/rl_replacements.py

+# More memory efficient by chunking on (bsz+qlen) dimension
+# Exactly equivalent to the above
+@torch.compile(dynamic = True, fullgraph = True, options = torch_compile_options,)
+def chunked_hidden_states_selective_log_softmax(
+    hidden_states, 
+    lm_head, 
+    index, 
+    chunks=4,
+    logit_scale_multiply=0.0, 
+    logit_scale_divide=0.0, 
+    logit_softcapping=0.0, 
+    temperature=1.0
+):


This new function is missing type hints for its parameters and return value, as well as a proper docstring. Adding them would improve readability and maintainability, following Python best practices (PEP 484). The existing comments can be converted into a formal docstring.

Suggested change

# More memory efficient by chunking on (bsz+qlen) dimension

# Exactly equivalent to the above

@torch.compile(dynamic = True, fullgraph = True, options = torch_compile_options,)

def chunked_hidden_states_selective_log_softmax(

hidden_states,

lm_head,

index,

chunks=4,

logit_scale_multiply=0.0,

logit_scale_divide=0.0,

logit_softcapping=0.0,

temperature=1.0

):

@torch.compile(dynamic = True, fullgraph = True, options = torch_compile_options,)

def chunked_hidden_states_selective_log_softmax(

hidden_states: torch.Tensor,

lm_head: torch.Tensor,

index: torch.Tensor,

chunks: int = 4,

logit_scale_multiply: float = 0.0,

logit_scale_divide: float = 0.0,

logit_softcapping: float = 0.0,

temperature: float = 1.0,

) -> torch.Tensor:

"""

More memory efficient by chunking on (bsz+qlen) dimension.

Exactly equivalent to the non-chunked version.

"""

gemini-code-assist · 2025-11-21T21:21:59Z

unsloth_zoo/rl_replacements.py

 pass
 RL_REPLACEMENTS["left_pack_padding"] = left_pack_padding

+import torch


torch is already imported at the top of this file (line 21). This import is redundant and should be removed to keep the code clean and avoid potential confusion.

I removed this for code cleanliness

gemini-code-assist · 2025-11-21T21:21:59Z

unsloth_zoo/rl_replacements.py

+    ref,
+    new,
+    old,


The parameters ref, new, and old are now expected to be log-probabilities. The previous names (ref_logits, etc.) were also not perfectly accurate, but the new names are even more ambiguous. Consider renaming them to something more descriptive like ref_logps, new_logps, and old_logps to improve code clarity and maintainability.

Suggested change

ref,

new,

old,

ref_logps,

new_logps,

old_logps,

Changed naming covnentions for these just now.

gemini-code-assist · 2025-11-21T21:21:59Z

unsloth_zoo/rl_replacements.py

            # mark_dynamic(input_ids_j)
            # mark_dynamic(mask_j)
-
+            #breakpoint()


This commented-out breakpoint() seems to be a leftover from a debugging session. It should be removed to keep the codebase clean.

I removed this breakpoint just now.

gemini-code-assist · 2025-11-21T21:21:59Z

unsloth_zoo/rl_replacements.py

+    def chunk_optional(tensor, chunks):
+        if tensor is None:
+            return [None] * chunks
+        return torch.chunk(tensor, chunks=chunks, dim=0)
+


The helper function chunk_optional is defined here again. It's a duplicate of the function defined on lines 606-609. This redundant definition should be removed to avoid code duplication.

I removed one of these chunk_optionals.

Removed unused import of torch.

danielhanchen · 2025-11-27T07:44:58Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-27T07:48:11Z

unsloth_zoo/rl_replacements.py

        max_left_pad = max(left_pad_tokens_per_prompt).item()

+        if max_left_pad < max(prev_max_left_pad).item() and (logits_to_keep +max_left_pad) != old_logps.shape[1]:
+            max_left_pad =  max(prev_max_left_pad).item()


Guard prev_max_left_pad before max()

When max_left_pad is not supplied in kwargs it defaults to None, but the new padding adjustment uses max(prev_max_left_pad).item() unconditionally. On the first call (or any call without that kwarg) this hits max(None) and raises a TypeError before any loss is computed, breaking training even for valid inputs. Add a None check or default to an empty tensor before calling max().

Useful? React with 👍 / 👎.

danielhanchen · 2025-11-28T13:50:54Z

unsloth_zoo/rl_replacements.py

            )
-        pass
    pass
+    # pass


danielhanchen · 2025-11-28T13:51:57Z

unsloth_zoo/rl_replacements.py

+
+    prev_max_left_pad    = kwargs.get("max_left_pad", None)
+
    #delete this from kwargs so less issues 


Spacing and capitalize and mention we enable by default

danielhanchen · 2025-11-28T13:53:46Z

unsloth_zoo/rl_replacements.py


        max_left_pad = max(left_pad_tokens_per_prompt).item()

+        if max_left_pad < max(prev_max_left_pad).item() and (logits_to_keep +max_left_pad) != old_logps.shape[1]:


Can you use torch ops

danielhanchen · 2025-11-28T13:54:40Z

unsloth_zoo/rl_replacements.py

+                new_hidden_states_chunk,
+                lm_head, 
+                completion_ids, 
+                chunks = 8,


Does increasing this reduce VRAM more but makes it slower?

pluesclues added 14 commits November 7, 2025 16:35

Refactor selective_log_softmax for memory efficiency

ea2544f

Introduced chunked_hidden_states_selective_log_softmax for memory efficiency and updated related functions to utilize it. Removed deprecated sampling parameter updates and adjusted logit handling in grpo_compute_loss.

Comment out unused code in rl_replacements.py

3de06d6

Comment out sections of code related to importance sampling and logit processing.

Merge branch 'main' into alternate_compute_chunked_loss

62b816a

Merge branch 'unslothai:main' into alternate_compute_chunked_loss

b464189

Enhance input processing for pixel values and grids

03555d4

Refactor input handling for pixel values and image grid, adding pre-calculation of padding and chunking logic.

Enhance padding logic with max_left_pad parameter

fb28f68

Added max_left_pad parameter handling and adjusted related logic for padding in the model's input processing.

Remove commented code and clean up logic

0d81583

Merge branch 'unslothai:main' into alternate_compute_chunked_loss

0672deb

Add importance sampling correction for log probabilities

24d95a2

Refactor rl_replacements.py by removing breakpoints

ad22465

Removed unnecessary breakpoints and cleaned up comments.

Use getattr for vllm_importance_sampling_correction

a35a3b2

Remove commented section from rl_replacements.py

f50bf15

Add chunked selective log softmax replacements

5bee2da

pluesclues mentioned this pull request Nov 21, 2025

Chunk Across Batch and Context length for logprob calculations for grpo unslothai/unsloth#3628

Open

gemini-code-assist bot reviewed Nov 21, 2025

View reviewed changes

pluesclues added 3 commits November 21, 2025 16:41

Refactor forward method to use log probabilities

d0709b0

Remove commented code and clean up kwargs handling

1cbaa63

Remove unused import in rl_replacements.py

43c91e1

Removed unused import of torch.

danielhanchen changed the base branch from main to nightly November 27, 2025 03:37

chatgpt-codex-connector bot reviewed Nov 27, 2025

View reviewed changes

danielhanchen reviewed Nov 28, 2025

View reviewed changes

-    ref,
-    new,
-    old,
+    ref_logps,
+    new_logps,
+    old_logps,


		prev_max_left_pad = kwargs.get("max_left_pad", None)

		#delete this from kwargs so less issues


		max_left_pad = max(left_pad_tokens_per_prompt).item()

		if max_left_pad < max(prev_max_left_pad).item() and (logits_to_keep +max_left_pad) != old_logps.shape[1]:

Chunk Across Batch and Context length for logprob calculations for grpo #357

Are you sure you want to change the base?

Chunk Across Batch and Context length for logprob calculations for grpo #357

Uh oh!

Conversation

pluesclues commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Nov 21, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Nov 27, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pluesclues commented Nov 21, 2025 •

edited

Loading