Make the eden model inherit from llama3.1 (#1316)

jstjohn · web-flow · commit 98dda8682941 · 2025-11-12T20:17:16.000Z
### Description Since the eden config inherits from llama3 rather than llama3.1 the default nemo conversion classes do not save the `rope_scaling` settings: ``` (Pdb) config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": null, "dtype": "bfloat16", "eos_token_id": 0, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000, "tie_word_embeddings": false, "transformers_version": "4.57.1", "use_cache": true, "vocab_size": 512 } ``` which should happen in NeMo with an isinstance match: ``` # For Llama 3.1 and Llama 3.2, rope_scaling is used and thus needed to parsed to the config if isinstance(source, Llama31Config): rope_scaling = { 'factor': source.scale_factor, 'low_freq_factor': source.low_freq_factor, 'high_freq_factor': source.high_freq_factor, 'original_max_position_embeddings': source.old_context_len, 'rope_type': 'llama3', } ``` This change modifies the inheritance structure so that this matches with the intended llama3.1 config that has the inverse frequency override. #### Usage ```bash BIONEMO_DATA_SOURCE=pbss py.test \ sub-packages/bionemo-evo2/tests/bionemo/evo2/models/test_llama.py \ sub-packages/bionemo-evo2/tests/bionemo/evo2/utils/checkpoint/test_eden_llama_roundtrip.py ``` Returns: ``` -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ========================================================================================================== slowest durations ========================================================================================================== 213.80s call sub-packages/bionemo-evo2/tests/bionemo/evo2/utils/checkpoint/test_eden_llama_roundtrip.py::test_eden_llama_roundtrip 74.26s call sub-packages/bionemo-evo2/tests/bionemo/evo2/models/test_llama.py::test_checkpoint_conversion 42.58s call sub-packages/bionemo-evo2/tests/bionemo/evo2/models/test_llama.py::test_golden_values_llama (6 durations < 30s hidden. Use -vv to show these durations.) ============================================================================================= 3 passed, 76 warnings in 343.93s (0:05:43) ============================================================================================== Skipping execution of on_app_end because OneLogger is not enabled. sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute ``` ### Type of changes  - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [x] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: John St John <jstjohn@nvidia.com>
diff --git a/sub-packages/bionemo-core/src/bionemo/core/data/resources/evo2_llama.yaml b/sub-packages/bionemo-core/src/bionemo/core/data/resources/evo2_llama.yaml
@@ -0,0 +1,17 @@
+- tag: eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814:1.0
+  ngc: null
+  ngc_registry: resource
+  pbss: "s3://bionemo-ci/test_data/evo2/eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814.tar.gz"
+  sha256: 7e13dde3ff1c2be113dcbd73de812b29b229cba700b7c981eb048e16dbb6b0cb  # pragma: allowlist secret
+  owner: John St John <jstjohn@nvidia.com>
+  description: >
+    Test data for Evo2 llama inference.
+
+- tag: 7B-8k-og2:1.0
+  ngc: null
+  ngc_registry: model
+  pbss: "s3://bionemo-ci/models/eden_llama_og2_step_182313.tar.gz"
+  sha256: 80a9dae5155f10c9c48e913be55900f51f231fab1252464938867c7511035010  # pragma: allowlist secret
+  owner: John St John <jstjohn@nvidia.com>
+  description: >
+    7b llama 3.1 checkpoint trained on the open genome 2 metagenome subset data for approximately 250 billion tokens.
diff --git a/sub-packages/bionemo-evo2/src/bionemo/evo2/models/llama.py b/sub-packages/bionemo-evo2/src/bionemo/evo2/models/llama.py
@@ -19,15 +19,15 @@
 
 import torch
 from nemo.collections import llm
-from nemo.collections.llm.gpt.model.llama import HFLlamaImporter, LlamaModel, apply_rope_scaling
+from nemo.collections.llm.gpt.model.llama import HFLlamaImporter, LlamaModel
 from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer
 from nemo.lightning import io
 from nemo.lightning.pytorch.utils import dtype_from_hf
 
 
 @dataclass
-class EdenConfig(llm.Llama3Config8B):
-    """Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors)."""
+class EdenConfig(llm.Llama31Config8B):
+    """Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors). Inherits from the llama 3.1 config for proper handling of RoPE when converting checkpoints."""
 
     rotary_base: int = 500_000
     seq_length: int = 8192
@@ -43,22 +43,6 @@ class EdenConfig(llm.Llama3Config8B):
     init_method_std: float = 0.02
     embedding_init_method_std: Optional[float] = None
 
-    def configure_model(self, *args, **kwargs):
-        """Configure and instantiate a Megatron Core Llama 3.1 model.
-
-        Extends the base configuration with Llama 3.1 specific RoPE scaling.
-        """
-        model = super(EdenConfig, self).configure_model(*args, **kwargs)
-        # Apply rope scaling for Llama3.1 model
-        model.rotary_pos_emb.inv_freq = apply_rope_scaling(
-            model.rotary_pos_emb.inv_freq,
-            factor=self.scale_factor,
-            low_freq_factor=self.low_freq_factor,
-            high_freq_factor=self.high_freq_factor,
-            old_context_len=self.old_context_len,
-        )
-        return model
-
 
 @dataclass
 class Eden11BConfig(EdenConfig):
diff --git a/sub-packages/bionemo-evo2/tests/bionemo/evo2/models/__init__.py b/sub-packages/bionemo-evo2/tests/bionemo/evo2/models/__init__.py
@@ -0,0 +1,14 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: LicenseRef-Apache2
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/sub-packages/bionemo-evo2/tests/bionemo/evo2/models/test_llama.py b/sub-packages/bionemo-evo2/tests/bionemo/evo2/models/test_llama.py
@@ -0,0 +1,117 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: LicenseRef-Apache2
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+import subprocess
+
+import pytest
+import torch
+from transformers import AutoModelForCausalLM
+
+from bionemo.core.data.load import load
+
+
+@pytest.fixture(scope="module")
+def eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814():
+    """Test data for Evo2 llama inference.
+
+    Returns:
+        tree
+            .
+            ├── per_layer_activations
+            │   └── activations_rank000_dl00_batch000000.pt
+            ├── predictions__rank_0__dp_rank_0.pt
+            ├── ribosomal_rrna_highly_conserved_PMC4140814.fasta
+            └── seq_idx_map.json
+
+    1 directory, 4 files
+    """
+    return load("evo2_llama/eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814:1.0")
+
+
+@pytest.fixture(scope="module")
+def llama_7b_8k_og2():
+    return load("evo2_llama/7B-8k-og2:1.0")
+
+
+@pytest.mark.skipif(os.environ.get("BIONEMO_DATA_SOURCE") != "pbss", reason="Test data is not available on NGC")
+def test_golden_values_llama(
+    tmp_path, eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814, llama_7b_8k_og2
+):
+    fasta_path = (
+        eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814
+        / "ribosomal_rrna_highly_conserved_PMC4140814.fasta"
+    )
+    gold_values_path = (
+        eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814 / "predictions__rank_0__dp_rank_0.pt"
+    )
+    output_dir = tmp_path / "predictions_llama"
+    prediction_cmd = (
+        f"predict_evo2 --fasta {fasta_path} --ckpt-dir {llama_7b_8k_og2} --output-dir {output_dir} --model-size 7B"
+    )
+    subprocess.run(prediction_cmd, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+    predictions = torch.load(output_dir / "predictions__rank_0__dp_rank_0.pt", weights_only=True)
+    gold_values = torch.load(gold_values_path, weights_only=True)
+    assert predictions["token_logits"].shape == gold_values["token_logits"].shape
+    torch.testing.assert_close(predictions["token_logits"], gold_values["token_logits"], atol=0.5, rtol=0)
+
+
+@pytest.mark.skipif(os.environ.get("BIONEMO_DATA_SOURCE") != "pbss", reason="Test data is not available on NGC")
+def test_checkpoint_conversion(
+    tmp_path, eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814, llama_7b_8k_og2
+):
+    target_dir = tmp_path / "llama_7b_8k_og2"
+    convert_cmd = f"evo2_nemo2_to_hf --model-type llama  --model-path {llama_7b_8k_og2} --output-dir {target_dir}"
+    subprocess.run(convert_cmd, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+    assert target_dir.exists()
+    assert target_dir.is_dir()
+    hf_model = AutoModelForCausalLM.from_pretrained(
+        target_dir,
+        torch_dtype=torch.bfloat16,
+        local_files_only=True,  # Force loading from local path, not HF Hub
+        use_cache=False,  # Disable use_cache to get the correct forward pass outside of generate.
+    ).eval()
+    # # Add hooks to capture inputs/outputs for forward pass
+    # activations = {}
+    # def capture_hook(name):
+    #     def hook(module, input, output):
+    #         # if not isinstance(input, torch.Tensor):
+    #         #     input = None
+    #         # if not isinstance(output, torch.Tensor):
+    #         #     output = None
+    #         activations[name] = {
+    #             'input': input,
+    #             'output': output
+    #         }
+    #     return hook
+    # # Register hooks on key layers
+    # for name, module in hf_model.named_modules():
+    #     module.register_forward_hook(capture_hook(name))
+    fasta_path = (
+        eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814
+        / "ribosomal_rrna_highly_conserved_PMC4140814.fasta"
+    )
+    with open(fasta_path, "r") as f:
+        fasta_seq = f.readlines()[1].strip()
+    input_ids = torch.tensor([ord(c) for c in fasta_seq]).unsqueeze(0)  # add batch dimension
+    with torch.no_grad():
+        outputs = hf_model(input_ids)
+    gold_values_path = (
+        eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814 / "predictions__rank_0__dp_rank_0.pt"
+    )
+    gold_values = torch.load(gold_values_path, weights_only=True)
+    assert outputs.logits.shape == gold_values["token_logits"].shape
+    torch.testing.assert_close(outputs.logits, gold_values["token_logits"].to(dtype=torch.bfloat16), atol=0.5, rtol=0)
diff --git a/sub-packages/bionemo-evo2/tests/bionemo/evo2/utils/checkpoint/test_eden_llama_roundtrip.py b/sub-packages/bionemo-evo2/tests/bionemo/evo2/utils/checkpoint/test_eden_llama_roundtrip.py
@@ -14,13 +14,14 @@
 # limitations under the License.
 
 import json
+import os
 from pathlib import Path
 
 import pytest
 import torch
-from lightning.fabric.plugins.environments.lightning import find_free_network_port
 from nemo.collections.llm.gpt.model.llama import HFLlamaExporter
 
+from bionemo.core.data.load import load
 from bionemo.evo2.models.llama import HFEdenLlamaImporter
 from bionemo.llm.lightning import batch_collator
 from bionemo.testing.subprocess_utils import run_command_in_subprocess
@@ -30,62 +31,69 @@
 
 
 @pytest.fixture(scope="module")
-def checkpoint_eden_path() -> Path:
-    """
-    mkdir -p $REPO_PATH/tmp_checkpoints
-    scp -r jstjohn@computelab-sc-01:/home/jstjohn/scratch/checkpoints/eden_llama_og2_step_182313 $REPO_PATH/tmp_checkpoints/
+def eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814():
+    """Test data for Evo2 llama inference.
+
+    Returns:
+        tree
+            .
+            ├── per_layer_activations
+            │   └── activations_rank000_dl00_batch000000.pt
+            ├── predictions__rank_0__dp_rank_0.pt
+            ├── ribosomal_rrna_highly_conserved_PMC4140814.fasta
+            └── seq_idx_map.json
+
+    1 directory, 4 files
     """
-    return REPO_PATH / "tmp_checkpoints" / "eden_llama_og2_step_182313"
+    return load("evo2_llama/eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814:1.0")
 
 
 @pytest.fixture(scope="module")
-def metagenome_fasta_path() -> Path:
-    """
-    mkdir -p $REPO_PATH/tmp_data
-    scp -r jstjohn@computelab-sc-01:/home/jstjohn/scratch/experiments/evo2_activations/ckpt_lm_loss_evals/lm_loss_work/evo2_metagenomics_test_only_sl8192_sd42.fasta $REPO_PATH/tmp_data/
-    """
-    return REPO_PATH / "tmp_data" / "evo2_metagenomics_test_only_sl8192_sd42.fasta"
+def llama_7b_8k_og2():
+    return load("evo2_llama/7B-8k-og2:1.0")
 
 
 def predict_metagenome(
     model_checkpoint_path: Path, metagenome_fasta_path: Path, output_path: Path
 ) -> tuple[dict[str, torch.Tensor], dict[str, int]]:
-    port = find_free_network_port()
-    cmd = f"""NCCL_P2P_DISABLE=1 torchrun --nproc_per_node=2 --master-port={port} --no-python  \
-        predict_evo2 \
+    cmd = f"""predict_evo2 \
             --eden-tokenizer \
-            --devices=2 \
             --model-size 7B \
-            --tensor-parallel-size=2 \
             --fasta {metagenome_fasta_path} \
             --ckpt-dir {model_checkpoint_path} \
             --output-log-prob-seqs \
             --log-prob-collapse-option per_token \
             --output-dir {output_path}"""
-    run_command_in_subprocess(cmd, str(REPO_PATH))
+    run_command_in_subprocess(cmd, os.getcwd())
     with open(output_path / "seq_idx_map.json", "r") as jsonf:
         fasta_to_index = json.load(jsonf)
     preds_list = [torch.load(f) for f in output_path.glob("*.pt")]
     all_pt_data = batch_collator([item for item in preds_list if item is not None])
     return all_pt_data, fasta_to_index  # type: ignore
 
 
+@pytest.mark.skipif(os.environ.get("BIONEMO_DATA_SOURCE") != "pbss", reason="Test data is not available on NGC")
 @pytest.mark.slow
-def test_eden_llama_roundtrip(tmp_path, checkpoint_eden_path: Path, metagenome_fasta_path: Path):
+def test_eden_llama_roundtrip(
+    tmp_path, llama_7b_8k_og2: Path, eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814: Path
+):
     """Test that converting NeMo -> HF -> NeMo produces the same model."""
-    if not checkpoint_eden_path.exists() or not metagenome_fasta_path.exists():
-        pytest.skip("Skipping test, first download the checkpoint and the metagenome fasta.")
+    fasta_path = (
+        eden_llama_og2_step_182313_on_evo2_rrna_highly_conserved_PMC4140814
+        / "ribosomal_rrna_highly_conserved_PMC4140814.fasta"
+    )
+    assert llama_7b_8k_og2.exists() and fasta_path.exists()
 
-    exporter = HFLlamaExporter(checkpoint_eden_path)
+    exporter = HFLlamaExporter(llama_7b_8k_og2)
     hf_path = tmp_path / "hf_checkpoint"
     exporter.apply(hf_path)
     importer = HFEdenLlamaImporter(hf_path)
     importer.apply(tmp_path / "nemo_checkpoint")
     original_predictions, original_fasta_to_index = predict_metagenome(
-        checkpoint_eden_path, metagenome_fasta_path, tmp_path / "original_predictions"
+        llama_7b_8k_og2, fasta_path, tmp_path / "original_predictions"
     )
     new_predictions, new_fasta_to_index = predict_metagenome(
-        tmp_path / "nemo_checkpoint", metagenome_fasta_path, tmp_path / "new_predictions"
+        tmp_path / "nemo_checkpoint", fasta_path, tmp_path / "new_predictions"
     )
     assert original_fasta_to_index == new_fasta_to_index, "Fasta to index mapping is not the same, need better logic."
     for key in ["seq_idx", "log_probs_seqs", "loss_mask"]: