Skip to content

Pull requests: pytorch/torchft

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

option 2 - call work.wait inside wrapped work CLA Signed This label is managed by the Meta Open Source bot.
#248 opened Jul 26, 2025 by tushar00jain Loading…
return work from manager allreduce CLA Signed This label is managed by the Meta Open Source bot.
#247 opened Jul 26, 2025 by tushar00jain Loading…
fix stream dependencies in callbacks CLA Signed This label is managed by the Meta Open Source bot.
#246 opened Jul 26, 2025 by tushar00jain Loading…
deep copy state dict for checkpoint CLA Signed This label is managed by the Meta Open Source bot.
#245 opened Jul 26, 2025 by tushar00jain Loading…
use http transport CLA Signed This label is managed by the Meta Open Source bot.
#244 opened Jul 26, 2025 by tushar00jain Loading…
option 1 - use block_current to overlap compute/communication CLA Signed This label is managed by the Meta Open Source bot.
#243 opened Jul 26, 2025 by tushar00jain Loading…
ProcessGroupNCCL: always eager init to avoid duplicate communicators for p2p ops CLA Signed This label is managed by the Meta Open Source bot.
#242 opened Jul 25, 2025 by d4l3k Loading…
fix compute/communication overlap for gloo CLA Signed This label is managed by the Meta Open Source bot.
#240 opened Jul 22, 2025 by tushar00jain Loading…
Fixing the issue with indentation on the landing page CLA Signed This label is managed by the Meta Open Source bot.
#227 opened Jul 9, 2025 by svekars Loading…
[WIP] Streaming DiLoCo prototype CLA Signed This label is managed by the Meta Open Source bot.
#203 opened May 28, 2025 by H-Huang Draft
Add config sharing from Lighthouse with UI support (#130) CLA Signed This label is managed by the Meta Open Source bot.
#202 opened May 24, 2025 by WarrenZhu050413 Draft
ParallelProcessGroup: 200gbps with Gloo -- what if we just run like 20 of them in parallel??? CLA Signed This label is managed by the Meta Open Source bot.
#199 opened May 21, 2025 by d4l3k Loading…
Added proactive heartbeat timeout failure propagation (#164) (#188) CLA Signed This label is managed by the Meta Open Source bot.
#196 opened May 20, 2025 by WarrenZhu050413 Loading…
Support multiple quorums on a single LighthouseServer using gRPC metadata-based room assignment CLA Signed This label is managed by the Meta Open Source bot.
#189 opened May 5, 2025 by MattKotzbauer Loading…
wip hang CLA Signed This label is managed by the Meta Open Source bot.
#148 opened Mar 25, 2025 by H-Huang Draft
fork ProcessGroupNCCL CLA Signed This label is managed by the Meta Open Source bot.
#134 opened Mar 14, 2025 by d4l3k Draft
abort PG on error CLA Signed This label is managed by the Meta Open Source bot.
#133 opened Mar 12, 2025 by d4l3k Draft
[WIP Fix pipe close warnings CLA Signed This label is managed by the Meta Open Source bot.
#129 opened Mar 10, 2025 by H-Huang Draft
Add option to skip init sync CLA Signed This label is managed by the Meta Open Source bot.
#127 opened Mar 10, 2025 by dl541 Draft
Disable async quorum for the first quorum sync CLA Signed This label is managed by the Meta Open Source bot.
#112 opened Feb 19, 2025 by fegin Draft
make torchft work for llama3_8b 8x CLA Signed This label is managed by the Meta Open Source bot.
#104 opened Feb 8, 2025 by d4l3k Draft
rust: add open telemetry tracing CLA Signed This label is managed by the Meta Open Source bot.
#80 opened Jan 24, 2025 by d4l3k Draft
[WIP] FSDP example CLA Signed This label is managed by the Meta Open Source bot.
#77 opened Jan 22, 2025 by mreso Draft
Test manager join CLA Signed This label is managed by the Meta Open Source bot.
#62 opened Jan 8, 2025 by Jackmin801 Draft
ProTip! no:milestone will show everything without a milestone.