Feat: Add MaxText GRPO training DAG for RL pipeline validation #999

RexBearIU · 2025-11-04T08:11:15Z

Description

This pull request introduces a new GRPO (Group Relative Policy Optimization) training DAG for the MaxText reinforcement learning pipeline, specifically targeting the Llama3.1 70B model. It also updates workflow triggers and adds support for a new nightly GRPO Docker image.

Airflow/Composer

GCP Composer name: jackyf-test (under GCP project: cloud-ml-auto-solutions)
GCP Composer version: 2.13.1

List links for your tests (use go/shortn-gen for any internal link):

jacky-test (Success) https://06ba93284e31466eb62067a2f46710a7-dot-us-east1.composer.googleusercontent.com/dags/maxtext_grpo_rl/grid?dag_run_id=manual__2025-10-03T08%3A27%3A25.167435%2B00%3A00

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run one-shot tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

Feat: Add MaxText GRPO training DAG for RL pipeline validation

Feat: Add MaxText GRPO training DAG for RL pipeline validation

f0a2bae

Feat: Add MaxText GRPO training DAG for RL pipeline validation

RexBearIU requested review from RissyRan, abhinavclemson, andrewyct, bhavya01, crankshaw-google, gobbleturk, hyeygit, jiangjy1982, ortibazar, parambole, polydier1, richardsliu, severus-ho, shralex, vipannalla, xibinliu, xuefgu and yixinshi as code owners November 4, 2025 08:11

RexBearIU marked this pull request as draft November 4, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Add MaxText GRPO training DAG for RL pipeline validation #999

Feat: Add MaxText GRPO training DAG for RL pipeline validation #999

Uh oh!

RexBearIU commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat: Add MaxText GRPO training DAG for RL pipeline validation #999

Are you sure you want to change the base?

Feat: Add MaxText GRPO training DAG for RL pipeline validation #999

Uh oh!

Conversation

RexBearIU commented Nov 4, 2025

Description

Airflow/Composer

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant