-
Notifications
You must be signed in to change notification settings - Fork 346
Fix: pailgemma type error #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@jp1924 If the language model does not pass the type test, there may be patches that were not reverted. |
@eljandoubi Thank you for your interest! However, this error occurs even when running only So, this is not an error caused by another multimodal test where a patch applied to Below is the model when the error occurs, and there’s no sign of any
Looking at other multimodal tests, That’s why I made this modification. |
@jp1924 Tests for PaliGemma succeed on my local machine and in the repository's CI/CD pipeline. Additionally, I've used it to train the model with real-world data using huggingface integration on the cloud. Can you provide more details about your environment? |
@jp1924 Can you provide the code you ran to reproduce the issue with only paligemma1/2 test cases? |
@eljandoubi @Tcc0403 Here’s the environment used for debugging:
The test was limited to
Scenario 1 When the model was run without passing the model to Scenario 2
An error occurs, as mentioned in this comment: After resolving this issue and running it again, the following error appears:
This error can be observed as the next issue. detail log==================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.5, pluggy-1.5.0
rootdir: /root/workspace/or-liger
configfile: pyproject.toml
collecting ...
-------------------------------------------------------------------------------------------------------------------- live log collection ---------------------------------------------------------------------------------------------------------------------
INFO datasets:config.py:54 PyTorch version 2.5.1+cu121 available.
collected 5 items / 3 deselected / 2 selected
test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_paligemma-32-0.0001-dtype3-0.001-0.01-0.1-0.01-0.01-0.01] FAILED [ 50%]
test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_paligemma2-32-0.0001-dtype4-0.001-0.01-0.1-0.01-0.01-0.01] FAILED [100%]
========================================================================================================================== FAILURES ==========================================================================================================================
_________________________________________________________________________________ test_mini_model_multimodal[mini_paligemma-32-0.0001-dtype3-0.001-0.01-0.1-0.01-0.01-0.01] __________________________________________________________________________________
model_name = 'mini_paligemma', num_steps = 32, lr = 0.0001, dtype = torch.bfloat16, loss_atol = 0.001, loss_rtol = 0.01, logits_atol = 0.1, logits_rtol = 0.01, param_atol = 0.01, param_rtol = 0.01
@pytest.mark.parametrize(
"model_name, num_steps, lr, dtype, loss_atol, loss_rtol, logits_atol, logits_rtol, param_atol, param_rtol",
[
pytest.param(
"mini_qwen2_vl",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not QWEN2_VL_AVAILABLE,
reason="Qwen2-VL not available in this version of transformers",
),
pytest.mark.skipif(device == "xpu", reason="skip for XPU"),
],
),
pytest.param(
"mini_qwen2_5_vl",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not QWEN2_5_VL_AVAILABLE,
reason="Qwen2.5-VL not available in this version of transformers",
),
pytest.mark.skipif(device == "xpu", reason="skip for XPU"),
],
),
pytest.param(
"mini_mllama",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not MLLAMA_AVAILABLE,
reason="Mllama not available in this version of transformers",
),
],
),
pytest.param(
"mini_paligemma",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not PALIGEMMA_AVAILABLE,
reason="Paligemma not available in this version of transformers",
),
],
),
pytest.param(
"mini_paligemma2",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not PALIGEMMA_AVAILABLE,
reason="Paligemma2 not available in this version of transformers",
),
],
),
],
)
def test_mini_model_multimodal(
model_name,
num_steps,
lr,
dtype,
loss_atol,
loss_rtol,
logits_atol,
logits_rtol,
param_atol,
param_rtol,
):
# Non-liger models should be initialized and tested first to avoid the module being overridden
expected_output = run_mini_model_multimodal(model_name=model_name, num_steps=num_steps, dtype=dtype, lr=lr)
> actual_output = run_mini_model_multimodal(
model_name=model_name, num_steps=num_steps, dtype=dtype, lr=lr, with_liger=True
)
test/convergence/bf16/test_mini_models_multimodal.py:676:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/convergence/bf16/test_mini_models_multimodal.py:532: in run_mini_model_multimodal
MINI_MODEL_SETUPS[model_name].liger_kernel_patch_func(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
rope = True, cross_entropy = False, fused_linear_cross_entropy = False, layer_norm = True, rms_norm = True, geglu = True
model = PaliGemmaForConditionalGeneration(
(vision_tower): SiglipVisionModel(
(vision_model): SiglipVisionTransformer(
...rotary_emb): GemmaRotaryEmbedding()
)
(lm_head): Linear(in_features=1024, out_features=32000, bias=False)
)
)
def apply_liger_kernel_to_paligemma(
rope: bool = True,
cross_entropy: bool = False,
fused_linear_cross_entropy: bool = True,
layer_norm: bool = True,
rms_norm: bool = True,
geglu: bool = True,
model: PreTrainedModel = None,
) -> None:
"""
Apply Liger kernels to replace original implementation in HuggingFace PaliGemma
Args:
rope (bool): Whether to apply Liger's rotary position embedding. Default is True.
cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.
fused_linear_cross_entropy (bool):
Whether to apply Liger's fused linear cross entropy loss. Default is True.
`cross_entropy` and `fused_linear_cross_entropy` cannot both be True.
If `fused_linear_cross_entropy` is True, the logits will not be materialized but more memory efficient.
layer_norm (bool): Whether to apply Liger's LayerNorm. Default is True.
rms_norm (bool): Whether to apply Liger's RMSNorm. Default is True.
geglu (bool): Whether to apply Liger's GeGLU MLP. Default is True.
model (PreTrainedModel): The model instance to apply Liger kernels to, if the model has already been
loaded. Default is None.
"""
assert not (cross_entropy and fused_linear_cross_entropy), (
"cross_entropy and fused_linear_cross_entropy cannot both be True."
)
# PaliGemma submodules are ['vision_tower', 'multi_modal_projector', 'language_model']
from transformers.models.gemma.modeling_gemma import GemmaForCausalLM
from transformers.models.gemma2.modeling_gemma2 import Gemma2ForCausalLM
from transformers.models.paligemma import modeling_paligemma
from transformers.models.paligemma.modeling_paligemma import PaliGemmaForConditionalGeneration
from transformers.models.siglip import modeling_siglip
from transformers.models.siglip.modeling_siglip import SiglipEncoderLayer
from transformers.models.siglip.modeling_siglip import SiglipVisionModel
from liger_kernel.transformers.model.paligemma import lce_forward
from liger_kernel.transformers.model.paligemma import lce_forward_deprecated
# The vision_tower is a SiglipVisionModel
if layer_norm:
modeling_siglip.nn.LayerNorm = LigerLayerNorm
# SiglipMLP is standard FFN so LigerGEGLUMLP is not compatible
# The multi_modal_projector is Linear, nothing to do
# The language_model is GemmaForCausalLM or Gemma2ForCausalLM
apply_liger_kernel_to_gemma(
rope=rope, cross_entropy=False, fused_linear_cross_entropy=False, rms_norm=rms_norm, geglu=geglu
)
apply_liger_kernel_to_gemma2(
rope=rope, cross_entropy=False, fused_linear_cross_entropy=False, rms_norm=rms_norm, geglu=geglu
)
# Handle loss function
if cross_entropy:
modeling_paligemma.nn.CrossEntropyLoss = LigerCrossEntropyLoss
if fused_linear_cross_entropy:
if transformer_version >= version.parse(SUPPORTED_TRANSFORMER_VERSION):
modeling_paligemma.PaliGemmaForConditionalGeneration.forward = lce_forward
else: # if version < 4.46.1
logger.warning(TRANSFORMER_DEPRECATION_WARNING)
modeling_paligemma.PaliGemmaForConditionalGeneration.forward = lce_forward_deprecated
if model is not None:
# The model instance already exists, so we need to additionally patch the
# instance variables that reference already-instantiated modules
if not isinstance(model, PaliGemmaForConditionalGeneration):
raise TypeError("model have to be of type PaliGemmaForConditionalGeneration")
vision_tower: SiglipVisionModel = model.vision_tower
_patch_layer_norm_module(vision_tower.vision_model.post_layernorm)
for layer in vision_tower.vision_model.encoder.layers:
layer: SiglipEncoderLayer
if layer_norm:
_patch_layer_norm_module(layer.layer_norm1)
_patch_layer_norm_module(layer.layer_norm2)
language_model = model.language_model
if isinstance(language_model, GemmaForCausalLM):
apply_liger_kernel_to_gemma(
rope=rope,
cross_entropy=False,
fused_linear_cross_entropy=False,
rms_norm=rms_norm,
geglu=geglu,
model=language_model,
)
elif isinstance(language_model, Gemma2ForCausalLM):
apply_liger_kernel_to_gemma2(
rope=rope,
cross_entropy=False,
fused_linear_cross_entropy=False,
rms_norm=rms_norm,
geglu=geglu,
model=language_model,
)
else:
> raise TypeError(
"The language_model of a PaliGemma model must be either GemmaForCausalLM or Gemma2ForCausalLM."
)
E TypeError: The language_model of a PaliGemma model must be either GemmaForCausalLM or Gemma2ForCausalLM.
src/liger_kernel/transformers/monkey_patch.py:720: TypeError
-------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------
Liger kernel patches have been reverted.
Step 0, Loss: 11.872488975524902
Step 1, Loss: 2.908055067062378
Step 2, Loss: 1.80923330783844
Step 3, Loss: 1.7920551300048828
Step 4, Loss: 1.4004158973693848
Step 5, Loss: 1.259793996810913
Step 6, Loss: 1.3108876943588257
Step 7, Loss: 1.0080386400222778
Step 8, Loss: 1.0473616123199463
Step 9, Loss: 1.1045541763305664
Step 10, Loss: 0.862948477268219
Step 11, Loss: 0.9946214556694031
Step 12, Loss: 0.7224061489105225
Step 13, Loss: 0.7274509072303772
Step 14, Loss: 0.79590904712677
Step 15, Loss: 0.8812856078147888
Step 16, Loss: 0.8721840977668762
Step 17, Loss: 0.8142553567886353
Step 18, Loss: 0.8549021482467651
Step 19, Loss: 0.65662682056427
Step 20, Loss: 0.6154780983924866
Step 21, Loss: 0.45201078057289124
Step 22, Loss: 0.6145424246788025
Step 23, Loss: 0.6072823405265808
Step 24, Loss: 0.41036954522132874
Step 25, Loss: 0.7026124000549316
Step 26, Loss: 0.3765292763710022
Step 27, Loss: 0.4183444380760193
Step 28, Loss: 0.5464693903923035
Step 29, Loss: 0.6562455296516418
Step 30, Loss: 0.5901074409484863
Step 31, Loss: 0.29771801829338074
Liger kernel patches have been reverted.
-------------------------------------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------------------------------------
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
_________________________________________________________________________________ test_mini_model_multimodal[mini_paligemma2-32-0.0001-dtype4-0.001-0.01-0.1-0.01-0.01-0.01] _________________________________________________________________________________
model_name = 'mini_paligemma2', num_steps = 32, lr = 0.0001, dtype = torch.bfloat16, loss_atol = 0.001, loss_rtol = 0.01, logits_atol = 0.1, logits_rtol = 0.01, param_atol = 0.01, param_rtol = 0.01
@pytest.mark.parametrize(
"model_name, num_steps, lr, dtype, loss_atol, loss_rtol, logits_atol, logits_rtol, param_atol, param_rtol",
[
pytest.param(
"mini_qwen2_vl",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not QWEN2_VL_AVAILABLE,
reason="Qwen2-VL not available in this version of transformers",
),
pytest.mark.skipif(device == "xpu", reason="skip for XPU"),
],
),
pytest.param(
"mini_qwen2_5_vl",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not QWEN2_5_VL_AVAILABLE,
reason="Qwen2.5-VL not available in this version of transformers",
),
pytest.mark.skipif(device == "xpu", reason="skip for XPU"),
],
),
pytest.param(
"mini_mllama",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not MLLAMA_AVAILABLE,
reason="Mllama not available in this version of transformers",
),
],
),
pytest.param(
"mini_paligemma",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not PALIGEMMA_AVAILABLE,
reason="Paligemma not available in this version of transformers",
),
],
),
pytest.param(
"mini_paligemma2",
32,
1e-4,
torch.bfloat16,
1e-3,
1e-2,
1e-1,
1e-2,
1e-2,
1e-2,
marks=[
pytest.mark.skipif(not supports_bfloat16(), reason="bfloat16 not supported on this GPU"),
pytest.mark.skipif(
not PALIGEMMA_AVAILABLE,
reason="Paligemma2 not available in this version of transformers",
),
],
),
],
)
def test_mini_model_multimodal(
model_name,
num_steps,
lr,
dtype,
loss_atol,
loss_rtol,
logits_atol,
logits_rtol,
param_atol,
param_rtol,
):
# Non-liger models should be initialized and tested first to avoid the module being overridden
expected_output = run_mini_model_multimodal(model_name=model_name, num_steps=num_steps, dtype=dtype, lr=lr)
> actual_output = run_mini_model_multimodal(
model_name=model_name, num_steps=num_steps, dtype=dtype, lr=lr, with_liger=True
)
test/convergence/bf16/test_mini_models_multimodal.py:676:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/convergence/bf16/test_mini_models_multimodal.py:532: in run_mini_model_multimodal
MINI_MODEL_SETUPS[model_name].liger_kernel_patch_func(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
rope = True, cross_entropy = False, fused_linear_cross_entropy = False, layer_norm = True, rms_norm = True, geglu = True
model = PaliGemmaForConditionalGeneration(
(vision_tower): SiglipVisionModel(
(vision_model): SiglipVisionTransformer(
...otary_emb): Gemma2RotaryEmbedding()
)
(lm_head): Linear(in_features=1024, out_features=32000, bias=False)
)
)
def apply_liger_kernel_to_paligemma(
rope: bool = True,
cross_entropy: bool = False,
fused_linear_cross_entropy: bool = True,
layer_norm: bool = True,
rms_norm: bool = True,
geglu: bool = True,
model: PreTrainedModel = None,
) -> None:
"""
Apply Liger kernels to replace original implementation in HuggingFace PaliGemma
Args:
rope (bool): Whether to apply Liger's rotary position embedding. Default is True.
cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.
fused_linear_cross_entropy (bool):
Whether to apply Liger's fused linear cross entropy loss. Default is True.
`cross_entropy` and `fused_linear_cross_entropy` cannot both be True.
If `fused_linear_cross_entropy` is True, the logits will not be materialized but more memory efficient.
layer_norm (bool): Whether to apply Liger's LayerNorm. Default is True.
rms_norm (bool): Whether to apply Liger's RMSNorm. Default is True.
geglu (bool): Whether to apply Liger's GeGLU MLP. Default is True.
model (PreTrainedModel): The model instance to apply Liger kernels to, if the model has already been
loaded. Default is None.
"""
assert not (cross_entropy and fused_linear_cross_entropy), (
"cross_entropy and fused_linear_cross_entropy cannot both be True."
)
# PaliGemma submodules are ['vision_tower', 'multi_modal_projector', 'language_model']
from transformers.models.gemma.modeling_gemma import GemmaForCausalLM
from transformers.models.gemma2.modeling_gemma2 import Gemma2ForCausalLM
from transformers.models.paligemma import modeling_paligemma
from transformers.models.paligemma.modeling_paligemma import PaliGemmaForConditionalGeneration
from transformers.models.siglip import modeling_siglip
from transformers.models.siglip.modeling_siglip import SiglipEncoderLayer
from transformers.models.siglip.modeling_siglip import SiglipVisionModel
from liger_kernel.transformers.model.paligemma import lce_forward
from liger_kernel.transformers.model.paligemma import lce_forward_deprecated
# The vision_tower is a SiglipVisionModel
if layer_norm:
modeling_siglip.nn.LayerNorm = LigerLayerNorm
# SiglipMLP is standard FFN so LigerGEGLUMLP is not compatible
# The multi_modal_projector is Linear, nothing to do
# The language_model is GemmaForCausalLM or Gemma2ForCausalLM
apply_liger_kernel_to_gemma(
rope=rope, cross_entropy=False, fused_linear_cross_entropy=False, rms_norm=rms_norm, geglu=geglu
)
apply_liger_kernel_to_gemma2(
rope=rope, cross_entropy=False, fused_linear_cross_entropy=False, rms_norm=rms_norm, geglu=geglu
)
# Handle loss function
if cross_entropy:
modeling_paligemma.nn.CrossEntropyLoss = LigerCrossEntropyLoss
if fused_linear_cross_entropy:
if transformer_version >= version.parse(SUPPORTED_TRANSFORMER_VERSION):
modeling_paligemma.PaliGemmaForConditionalGeneration.forward = lce_forward
else: # if version < 4.46.1
logger.warning(TRANSFORMER_DEPRECATION_WARNING)
modeling_paligemma.PaliGemmaForConditionalGeneration.forward = lce_forward_deprecated
if model is not None:
# The model instance already exists, so we need to additionally patch the
# instance variables that reference already-instantiated modules
if not isinstance(model, PaliGemmaForConditionalGeneration):
raise TypeError("model have to be of type PaliGemmaForConditionalGeneration")
vision_tower: SiglipVisionModel = model.vision_tower
_patch_layer_norm_module(vision_tower.vision_model.post_layernorm)
for layer in vision_tower.vision_model.encoder.layers:
layer: SiglipEncoderLayer
if layer_norm:
_patch_layer_norm_module(layer.layer_norm1)
_patch_layer_norm_module(layer.layer_norm2)
language_model = model.language_model
if isinstance(language_model, GemmaForCausalLM):
apply_liger_kernel_to_gemma(
rope=rope,
cross_entropy=False,
fused_linear_cross_entropy=False,
rms_norm=rms_norm,
geglu=geglu,
model=language_model,
)
elif isinstance(language_model, Gemma2ForCausalLM):
apply_liger_kernel_to_gemma2(
rope=rope,
cross_entropy=False,
fused_linear_cross_entropy=False,
rms_norm=rms_norm,
geglu=geglu,
model=language_model,
)
else:
> raise TypeError(
"The language_model of a PaliGemma model must be either GemmaForCausalLM or Gemma2ForCausalLM."
)
E TypeError: The language_model of a PaliGemma model must be either GemmaForCausalLM or Gemma2ForCausalLM.
src/liger_kernel/transformers/monkey_patch.py:720: TypeError
-------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------
Liger kernel patches have been reverted.
Step 0, Loss: 11.491681098937988
Step 1, Loss: 5.270604133605957
Step 2, Loss: 2.1230340003967285
Step 3, Loss: 1.9017783403396606
Step 4, Loss: 1.393043875694275
Step 5, Loss: 1.2913684844970703
Step 6, Loss: 1.3962070941925049
Step 7, Loss: 1.1137727499008179
Step 8, Loss: 1.162048101425171
Step 9, Loss: 1.226529836654663
Step 10, Loss: 0.9842991232872009
Step 11, Loss: 1.1158515214920044
Step 12, Loss: 0.8610438704490662
Step 13, Loss: 0.8677318692207336
Step 14, Loss: 0.9316790699958801
Step 15, Loss: 0.9952855110168457
Step 16, Loss: 0.9811475872993469
Step 17, Loss: 0.9185448288917542
Step 18, Loss: 0.9610309600830078
Step 19, Loss: 0.7695668935775757
Step 20, Loss: 0.7231904864311218
Step 21, Loss: 0.553207516670227
Step 22, Loss: 0.7114338278770447
Step 23, Loss: 0.7015824913978577
Step 24, Loss: 0.5099937915802002
Step 25, Loss: 0.7909064292907715
Step 26, Loss: 0.468069851398468
Step 27, Loss: 0.5070297718048096
Step 28, Loss: 0.626304566860199
Step 29, Loss: 0.7336246967315674
Step 30, Loss: 0.6568887829780579
Step 31, Loss: 0.3821035921573639
Liger kernel patches have been reverted.
-------------------------------------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------------------------------------
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
====================================================================================================================== warnings summary ======================================================================================================================
../../../usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py:1441
/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py:1441: PytestConfigWarning: Unknown config option: asyncio_mode
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_paligemma-32-0.0001-dtype3-0.001-0.01-0.1-0.01-0.01-0.01]
test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_paligemma2-32-0.0001-dtype4-0.001-0.01-0.1-0.01-0.01-0.01]
/root/workspace/or-liger/test/utils.py:159: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
input_ids = torch.cat([torch.tensor(item["input_ids"]) for item in data])
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================== short test summary info ===================================================================================================================
FAILED test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_paligemma-32-0.0001-dtype3-0.001-0.01-0.1-0.01-0.01-0.01] - TypeError: The language_model of a PaliGemma model must be either GemmaForCausalLM or Gemma2ForCausalLM.
FAILED test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_paligemma2-32-0.0001-dtype4-0.001-0.01-0.1-0.01-0.01-0.01] - TypeError: The language_model of a PaliGemma model must be either GemmaForCausalLM or Gemma2ForCausalLM.
======================================================================================================== 2 failed, 3 deselected, 3 warnings in 30.95s ======================================================================================================== |
Context: we try to instantiate the model first and apply monkey patch to the instance. the change looks like this: def run_mini_model_multimodal(
model_name="mini_qwen2_vl",
num_steps=100,
dtype=torch.bfloat16,
lr=1e-5,
with_liger=False,
):
set_seed(42)
model = create_model(model_name).to(dtype).to(device) # instantiate first
revert_kwargs = {"model_config": MINI_MODEL_SETUPS[model_name]}
if "mllama" in model_name:
revert_kwargs["model_type"] = "conditional_generation"
if with_liger is True:
kwargs = {
"rope": True,
"rms_norm": True,
"cross_entropy": False,
}
if "qwen2_5_vl" not in model_name:
kwargs["layer_norm"] = True
if "gemma" in model_name:
kwargs["geglu"] = True
else:
kwargs["swiglu"] = True
kwargs["model"] = model # add arg model
MINI_MODEL_SETUPS[model_name].liger_kernel_patch_func(**kwargs) # patch later
else:
MINI_MODEL_SETUPS[model_name].liger_kernel_patch_revert_func(**revert_kwargs)
... Here's the reproduce code: gist ❯ python3 revert_then_patch.py
True
0x55d0cd1d08a0, id of model.language_model.__class__ # object 0
0x55d0cd1d08a0, id of modeling_gemma.GemmaForCausalLM # object 0
=========================== importlib.reload(modeling_gemma) ====================================
False
0x55d0cd1d08a0, id of model.language_model.__class__ # object 0
0x55d0cec372d0, id of modeling_gemma.GemmaForCausalLM # object 1 The root cause is reloading modules for reverting, which makes This is not a common use case, but this fix can cover the edge case without rewriting current revert functions. |
@jp1924 @Tcc0403 If we are unable to revert the patch, testing Paligemma-2 after Paligemma-1 will create the same Siglip model for Paligemma-2 with Liger-Kernel for both the expectation and the target. |
@eljandoubi I agree. We really need a more robust function to revert these monkey patches for our tests. |
Summary
@Tcc0403
back ground
#524 (comment)
Once, while I was working on #524 PR, the following error occurred in the pailgemma section, and I checked the cause.
As you can see in the picture, there is a strange situation where False comes up even though I am doing a type check with a language_model that is assigned GemmaForCasualLM.
So, I suggested to patch the liger_kernel by importing the model_type directly with
model.config.text_config.model_type
, outside of isinstance, and then patching the liger_kernel.Testing Done
make test
to ensure correctnessmake checkstyle
to ensure code stylemake test-convergence
to ensure convergence