Skip to content

Conversation

@RexBearIU
Copy link
Contributor

Description

This pull request introduces a new GRPO (Group Relative Policy Optimization) training DAG for the MaxText reinforcement learning pipeline, specifically targeting the Llama3.1 70B model. It also updates workflow triggers and adds support for a new nightly GRPO Docker image.

Airflow/Composer

  • GCP Composer name: jackyf-test (under GCP project: cloud-ml-auto-solutions)
  • GCP Composer version: 2.13.1

List links for your tests (use go/shortn-gen for any internal link):

jacky-test (Success) https://06ba93284e31466eb62067a2f46710a7-dot-us-east1.composer.googleusercontent.com/dags/maxtext_grpo_rl/grid?dag_run_id=manual__2025-10-03T08%3A27%3A25.167435%2B00%3A00

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run one-shot tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Feat: Add MaxText GRPO training DAG for RL pipeline validation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant