[BUG] cannot capture your model as a full graph

torch version: 2.5.0.dev20240616+cu121
python version: python 3.8

I run the llama example with torchrun --nproc-per-node 2 pippy_llama.py. It got an Error
```
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.26s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.27s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
layers_per_rank = 16
layers_per_rank = 16
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1006, in _trace_with_export
[rank0]:     ep = torch.export.export(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/__init__.py", line 174, in export
[rank0]:     return _export(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/_trace.py", line 952, in wrapper
[rank0]:     raise e
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/_trace.py", line 935, in wrapper
[rank0]:     ep = fn(*args, **kwargs)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/exported_program.py", line 91, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/_trace.py", line 1547, in _export
[rank0]:     exported_program = ExportedProgram(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/exported_program.py", line 248, in __init__
[rank0]:     self.verifier().check(self)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/_export/verifier.py", line 154, in check
[rank0]:     self._check_graph_module(ep.graph_module)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/_export/verifier.py", line 220, in _check_graph_module
[rank0]:     _check_val(node)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/_export/verifier.py", line 62, in _check_val
[rank0]:     raise SpecViolationError(f"Node.meta {node.name} is missing val field.")
[rank0]: torch._export.verifier.SpecViolationError: Node.meta _enter_autocast is missing val field.

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "pippy_llama.py", line 36, in <module>
[rank0]:     pipe = pipeline(llama, mb_args=(mb_inputs["input_ids"],))
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1236, in pipeline
[rank0]:     return Pipe.from_tracing(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1044, in from_tracing
[rank0]:     exported_program = Pipe._trace_with_export(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1012, in _trace_with_export
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: It seems that we cannot capture your model as a full graph. Typical reasons include graph breaks, data/shape-dependent control flow, or missing meta kernels for custom operators. You can use our manual pipeline interfaces, or try to fix the graph breaks, see https://pytorch.org/docs/stable/export.html
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] cannot capture your model as a full graph #1132

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] cannot capture your model as a full graph #1132

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions