Skip to content

fix(geneformer): resolve protobuf conflict with nvidia-resiliency-ext>=0.6.0#1598

Open
svc-bionemo wants to merge 1 commit into
NVIDIA-BioNeMo:mainfrom
svc-bionemo:svc-bionemo/fix-nightly-20260603-7bb88738
Open

fix(geneformer): resolve protobuf conflict with nvidia-resiliency-ext>=0.6.0#1598
svc-bionemo wants to merge 1 commit into
NVIDIA-BioNeMo:mainfrom
svc-bionemo:svc-bionemo/fix-nightly-20260603-7bb88738

Conversation

@svc-bionemo
Copy link
Copy Markdown
Collaborator

@svc-bionemo svc-bionemo commented Jun 3, 2026

Problem

The geneformer nightly CI fails because megatron-core==0.17.1 asserts nvidia-resiliency-ext>=0.6.0 at import time, but that version isn't installed in the CI container.

Simply adding nvidia-resiliency-ext>=0.6.0 to pyproject.toml creates an unresolvable pip conflict:

  • nvidia-resiliency-ext 0.6.0 → grpcio-tools>=1.76.0 → protobuf>=6.30.0
  • nemo-toolkit==2.4.0 → protobuf~=5.29.5

Fix

Add a .ci_build.sh that installs nvidia-resiliency-ext>=0.6.0 with --no-deps (skipping grpcio-tools which is not needed for test execution), then installs the package normally.

The CI workflow already supports .ci_build.sh as a hook — if the file exists, it runs that instead of the default pip install -e ..

Root Cause

nvidia-resiliency-ext 0.6.0 added grpcio/grpcio-tools as hard dependencies (for its new gRPC-based fault tolerance features), but these are incompatible with nemo-toolkit 2.4.0's protobuf pin. This will resolve when nemo-toolkit relaxes its protobuf constraint.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3b95d24d-7202-488b-a6c9-8ecc71a53137

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@pstjohn
Copy link
Copy Markdown
Collaborator

pstjohn commented Jun 3, 2026

/ok to test 3e37d22

@pstjohn pstjohn enabled auto-merge June 3, 2026 13:49
…>=0.6.0

megatron-core==0.17.1 requires nvidia-resiliency-ext>=0.6.0 at runtime,
but nvidia-resiliency-ext 0.6.0 pulls in grpcio-tools>=1.76.0 which
requires protobuf>=6.30.0 — conflicting with nemo-toolkit==2.4.0 pinning
protobuf~=5.29.5.

Fix by using a .ci_build.sh script that installs nvidia-resiliency-ext
with --no-deps (skipping grpcio-tools) and then installs the package
normally. grpcio-tools is not needed for geneformer test execution.

Signed-off-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>
auto-merge was automatically disabled June 3, 2026 15:25

Head branch was pushed to by a user without write access

@svc-bionemo svc-bionemo force-pushed the svc-bionemo/fix-nightly-20260603-7bb88738 branch from 3e37d22 to 9fe6df5 Compare June 3, 2026 15:25
@svc-bionemo svc-bionemo changed the title fix(geneformer): add nvidia-resiliency-ext>=0.6.0 dependency fix(geneformer): resolve protobuf conflict with nvidia-resiliency-ext>=0.6.0 Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants