Skip to content
This repository was archived by the owner on Aug 5, 2025. It is now read-only.
This repository was archived by the owner on Aug 5, 2025. It is now read-only.

[BUG] cannot capture your model as a full graph #1132

@sunkun1997

Description

@sunkun1997

torch version: 2.5.0.dev20240616+cu121
python version: python 3.8

I run the llama example with torchrun --nproc-per-node 2 pippy_llama.py. It got an Error

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.26s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.27s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
layers_per_rank = 16
layers_per_rank = 16
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1006, in _trace_with_export
[rank0]:     ep = torch.export.export(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/__init__.py", line 174, in export
[rank0]:     return _export(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/_trace.py", line 952, in wrapper
[rank0]:     raise e
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/_trace.py", line 935, in wrapper
[rank0]:     ep = fn(*args, **kwargs)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/exported_program.py", line 91, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/_trace.py", line 1547, in _export
[rank0]:     exported_program = ExportedProgram(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/export/exported_program.py", line 248, in __init__
[rank0]:     self.verifier().check(self)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/_export/verifier.py", line 154, in check
[rank0]:     self._check_graph_module(ep.graph_module)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/_export/verifier.py", line 220, in _check_graph_module
[rank0]:     _check_val(node)
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/_export/verifier.py", line 62, in _check_val
[rank0]:     raise SpecViolationError(f"Node.meta {node.name} is missing val field.")
[rank0]: torch._export.verifier.SpecViolationError: Node.meta _enter_autocast is missing val field.

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "pippy_llama.py", line 36, in <module>
[rank0]:     pipe = pipeline(llama, mb_args=(mb_inputs["input_ids"],))
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1236, in pipeline
[rank0]:     return Pipe.from_tracing(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1044, in from_tracing
[rank0]:     exported_program = Pipe._trace_with_export(
[rank0]:   File "/home/ray/anaconda3/lib/python3.8/site-packages/torch/distributed/pipelining/_IR.py", line 1012, in _trace_with_export
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: It seems that we cannot capture your model as a full graph. Typical reasons include graph breaks, data/shape-dependent control flow, or missing meta kernels for custom operators. You can use our manual pipeline interfaces, or try to fix the graph breaks, see https://pytorch.org/docs/stable/export.html
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions