Skip to content

Conversation

wconstab
Copy link
Contributor

so far just tested locally
LOG_RANK=4 CONFIG_FILE=././torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name llama3_auto_parallel --parallelism.pipeline_parallel_degree 2 --training.steps 100

Runs and loss converges.

Left one TODO about global-batch-size and gradient accumulation

so far just tested locally
`LOG_RANK=4 CONFIG_FILE=././torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name llama3_auto_parallel --parallelism.pipeline_parallel_degree 2 --training.steps 100`

Runs and loss converges.

Left one TODO about global-batch-size and gradient accumulation
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 28, 2025
@wconstab wconstab requested review from fmassa, ezyang, sanketpurandare, bdhirsh and xmfan and removed request for fegin, wwwjn and tianyu-l August 28, 2025 23:29

pp_degree = job_config.parallelism.pipeline_parallel_degree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused pp degree config, should probably raise error when its not local world size

spmd_dims.append("tp")
spmd_mesh = world_mesh[spmd_dims]

dp_degree = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, config could specify dp_degree

inputs, target=targets, losses=losses, input_batch=inputs
# TODO: input_batch kwarg only needed for CP, but
# autoparallel doesn't accept kwargs in its forward
inputs, target=targets, losses=losses #, input_batch=inputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why does CP need input_batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed you would know. Am I wrong?


pp_degree = job_config.parallelism.pipeline_parallel_degree
local_batch_size = job_config.training.local_batch_size
spmd_batch_size = local_batch_size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops this is a bug for the non-pp case. should be local *dp degree and put in an 'else' branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants