Tutorials for showing how to add SentenceTransformer Model and modifying gpu resources for Embedding Generator #1266

praateekmahajan · 2025-11-21T08:03:00Z

Description

Speedup Embedding Generator by using 1/2 half GPU

Assuming the loaded embedding generation model takes less GPU resources, we can schedule more than 1 actor on the same GPU to get a speedup.
Please note this is dependent on model used and GPU sku.
The tutorial shows how to modify the stage and also measures the time taken

Sentence Transformers

Until Support Sentence Transformer Models in Embedding Generation and possibly other places #1265 is resolved this tutorial shows how to use SentenceTransformer inside our existing framework

Usage

# Add snippet demonstrating usage

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Praateek <[email protected]>

greptile-apps · 2025-11-21T08:05:06Z

Greptile Overview

Greptile Summary

This PR adds three educational tutorials demonstrating advanced embedding generation techniques: GPU resource optimization through fractional GPU allocation (achieving 28% speedup), workaround implementation for SentenceTransformer models, and a comprehensive step-by-step semantic deduplication workflow.

Key Changes:

fast_embedding_generation.ipynb: Shows how to use Resources(gpus=0.5) to schedule multiple actors per GPU when models have low GPU utilization
implement_sentence_transformer.ipynb: Provides a temporary workaround for issue Support Sentence Transformer Models in Embedding Generation and possibly other places #1265 by extending EmbeddingModelStage to work with SentenceTransformer library
semantic_step_by_step.ipynb: Breaks down the semantic deduplication process into discrete stages (ID generation, embedding creation, K-means clustering, duplicate identification, and removal) for better understanding and control

Issues Found:

Previous review comments note potential API compatibility concerns in the SentenceTransformer implementation, though the notebook shows it working correctly with test data

Confidence Score: 4/5

This PR is safe to merge with minor concerns about the SentenceTransformer implementation approach
The tutorials are well-structured and educational. The GPU optimization tutorial is straightforward and demonstrates valid performance improvements. The semantic deduplication tutorial is comprehensive and helpful. However, the SentenceTransformer implementation uses a workaround approach that previous reviews flagged for potential API compatibility issues, though it appears to work in practice. This is explicitly noted as a temporary solution until issue Support Sentence Transformer Models in Embedding Generation and possibly other places #1265 is resolved.
Pay attention to implement_sentence_transformer.ipynb - the workaround implementation may need updates when issue Support Sentence Transformer Models in Embedding Generation and possibly other places #1265 is properly resolved

Important Files Changed

File Analysis

Filename	Score	Overview
tutorials/text/embedding-generation/fast_embedding_generation.ipynb	4/5	Tutorial demonstrating GPU resource optimization by using 0.5 GPU per actor to speed up embedding generation, achieving 28% speedup (213s to 153s)
tutorials/text/embedding-generation/implement_sentence_transformer.ipynb	3/5	Tutorial showing SentenceTransformer integration by extending EmbeddingModelStage; existing review comments note potential API compatibility issues with direct model calling
tutorials/text/embedding-generation/semantic_step_by_step.ipynb	4/5	Comprehensive semantic deduplication tutorial breaking down workflow into discrete steps (ID generation, embeddings, K-means, duplicate identification, removal), achieving 27.54% data reduction

Sequence Diagram

sequenceDiagram
    participant User
    participant Pipeline
    participant EmbeddingCreatorStage
    participant TokenizerStage
    participant ModelStage
    participant GPU

    Note over User,GPU: Tutorial 1: Fast Embedding Generation
    User->>Pipeline: Configure with Resources(gpus=0.5)
    User->>Pipeline: Run with RayDataExecutor
    Pipeline->>EmbeddingCreatorStage: Process documents
    EmbeddingCreatorStage->>GPU: Schedule 2 actors per GPU
    GPU-->>User: Complete in 153s vs 213s (28% speedup)

    Note over User,GPU: Tutorial 2: SentenceTransformer Integration
    User->>EmbeddingCreatorStage: Create custom stage
    EmbeddingCreatorStage->>TokenizerStage: Tokenize text
    TokenizerStage-->>EmbeddingCreatorStage: Return input_ids, attention_mask
    EmbeddingCreatorStage->>ModelStage: SentenceTransformerEmbeddingModelStage
    ModelStage->>ModelStage: setup() loads SentenceTransformer
    ModelStage->>GPU: Run inference with unpack_inference_batch=False
    GPU-->>ModelStage: Return outputs["sentence_embedding"]
    ModelStage-->>User: Generate embeddings

    Note over User,GPU: Tutorial 3: Semantic Deduplication Workflow
    User->>Pipeline: Step 1: Create ID Generator
    User->>Pipeline: Step 2: Generate Embeddings
    Pipeline->>EmbeddingCreatorStage: Process with SentenceTransformer
    EmbeddingCreatorStage-->>User: Write embeddings to cache
    User->>Pipeline: Step 3: K-Means + Pairwise Similarity
    Pipeline-->>User: Cluster and compute similarities
    User->>Pipeline: Step 4: Identify Duplicates (eps=0.1)
    Pipeline-->>User: Find 583,721 duplicates (27.54%)
    User->>Pipeline: Step 5: Remove Duplicates
    Pipeline-->>User: Output 1,535,998 deduplicated rows

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-21T08:05:06Z

tutorials/text/embedding-generation/implement_sentence_transformer.ipynb

+    "        model_inference_batch_size: int = 1024,\n",
+    "        has_seq_order: bool = True,\n",
+    "        padding_side: Literal[\"left\", \"right\"] = \"right\",\n",
+    "        autocast: bool = True,\n",


syntax: typo in markdown cell: "composte" should be "composite"

Signed-off-by: Praateek <[email protected]>

praateekmahajan · 2025-11-21T08:16:27Z

@sarahyurick / @ayushdg I'm okay to merge this or not.. needed this tutorial to show the customer how to do it if they have to

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-21T08:19:35Z

tutorials/text/embedding-generation/implement_sentence_transformer.ipynb

+    "            hf_token=hf_token,\n",
+    "            model_inference_batch_size=model_inference_batch_size,\n",
+    "            has_seq_order=has_seq_order,\n",
+    "            padding_side=padding_side,\n",
+    "            autocast=autocast,\n",


logic: SentenceTransformer doesn't return a dict with sentence_embedding key when called directly. The parent ModelStage._model_forward passes the batch as self.model(model_input_batch) (without unpacking), but SentenceTransformer doesn't accept input_ids/attention_mask directly. Need to override _model_forward to use the .encode() method instead.

Suggested change

" hf_token=hf_token,\n",

" model_inference_batch_size=model_inference_batch_size,\n",

" has_seq_order=has_seq_order,\n",

" padding_side=padding_side,\n",

" autocast=autocast,\n",

def _model_forward(self, model_input_batch: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:

"""Forward pass through SentenceTransformer model."""

# SentenceTransformer.encode expects input_ids and attention_mask as dict

embeddings = self.model.encode(

model_input_batch,

convert_to_tensor=True,

show_progress_bar=False,

)

return {"sentence_embedding": embeddings}

def process_model_output(

self,

outputs: dict[str, torch.Tensor],

model_input_batch: dict[str, torch.Tensor] | None = None, # noqa: ARG002

) -> torch.Tensor:

return outputs["sentence_embedding"].cpu()

greptile-apps · 2025-11-21T08:19:36Z

tutorials/text/embedding-generation/implement_sentence_transformer.ipynb

+    "\n",
+    "class SentenceTransformerEmbeddingModelStage(EmbeddingModelStage):\n",
+    "    def __init__(  # noqa: PLR0913\n",


style: setting unpack_inference_batch = False causes the parent to call self.model(model_input_batch) where model_input_batch is a dict. This relies on SentenceTransformer.__call__ accepting a dict, which is non-standard. The standard API is .encode() or .forward() with unpacked kwargs. Consider documenting this behavior or using the standard API.

sarahyurick · 2025-11-21T17:46:55Z

tutorials/text/embedding-generation/implement_sentence_transformer.ipynb

Love how easy this is. IMO we should add it to the codebase (and eventually our documentation) instead of a tutorial, since it is so straightforward.

Signed-off-by: Sarah Yurick <[email protected]>

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

sarahyurick · 2026-01-07T21:06:12Z

Closing in favor of #1346, tysm!

fc

4a7d2b4

Signed-off-by: Praateek <[email protected]>

greptile-apps bot reviewed Nov 21, 2025

View reviewed changes

typo

2a426aa

Signed-off-by: Praateek <[email protected]>

praateekmahajan requested review from ayushdg and sarahyurick November 21, 2025 08:15

greptile-apps bot reviewed Nov 21, 2025

View reviewed changes

sarahyurick reviewed Nov 21, 2025

View reviewed changes

add semantic tutorial with sentencetransformer model

5d14c2b

Signed-off-by: Sarah Yurick <[email protected]>

copy-pr-bot bot temporarily deployed to test December 5, 2025 23:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 5, 2025 23:06 Inactive

greptile-apps bot reviewed Dec 5, 2025

View reviewed changes

copy-pr-bot bot had a problem deploying to nemo-ci December 5, 2025 23:24 Failure

sarahyurick closed this Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tutorials for showing how to add SentenceTransformer Model and modifying gpu resources for Embedding Generator #1266

Tutorials for showing how to add SentenceTransformer Model and modifying gpu resources for Embedding Generator #1266

Uh oh!

praateekmahajan commented Nov 21, 2025

Uh oh!

greptile-apps bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 21, 2025

Uh oh!

praateekmahajan commented Nov 21, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 21, 2025

Uh oh!

greptile-apps bot Nov 21, 2025

Uh oh!

sarahyurick Nov 21, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

sarahyurick commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    "            hf_token=hf_token,\n",
-    "            model_inference_batch_size=model_inference_batch_size,\n",
-    "            has_seq_order=has_seq_order,\n",
-    "            padding_side=padding_side,\n",
-    "            autocast=autocast,\n",
+    def _model_forward(self, model_input_batch: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
+        """Forward pass through SentenceTransformer model."""
+        # SentenceTransformer.encode expects input_ids and attention_mask as dict
+        embeddings = self.model.encode(
+            model_input_batch,
+            convert_to_tensor=True,
+            show_progress_bar=False,
+        )
+        return {"sentence_embedding": embeddings}
+    def process_model_output(
+        self,
+        outputs: dict[str, torch.Tensor],
+        model_input_batch: dict[str, torch.Tensor] | None = None,  # noqa: ARG002
+    ) -> torch.Tensor:
+        return outputs["sentence_embedding"].cpu()

Tutorials for showing how to add SentenceTransformer Model and modifying gpu resources for Embedding Generator #1266

Tutorials for showing how to add SentenceTransformer Model and modifying gpu resources for Embedding Generator #1266

Uh oh!

Conversation

praateekmahajan commented Nov 21, 2025

Description

Speedup Embedding Generator by using 1/2 half GPU

Sentence Transformers

Usage

Checklist

Uh oh!

greptile-apps bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

praateekmahajan commented Nov 21, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

sarahyurick Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

sarahyurick commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Nov 21, 2025 •

edited

Loading