⚡️ Speed up method `RFDetrForObjectDetectionTorch.post_process` by 17% in PR #1250 (`feature/inference-v1-models`) #1276

codeflash-ai · 2025-05-14T12:42:31Z

⚡️ This pull request contains optimizations for PR #1250

If you approve this dependent PR, these changes will be merged into the original PR branch feature/inference-v1-models.

This PR will be automatically closed if the original PR is merged.

📄 17% (0.17x) speedup for `RFDetrForObjectDetectionTorch.post_process` in `inference/v1/models/rfdetr/rfdetr_object_detection_pytorch.py`

⏱️ Runtime : 25.0 milliseconds → 21.3 milliseconds (best of 11 runs)

📝 Explanation and details

Here’s a faster version, focusing on eliminating Python loops in favor of vectorized/tensor operations, batching, minimizing temporary allocations, and avoiding repeated work. Function signatures and return values are unchanged.

Key speedups:

torch.tensor(orig_sizes, ...) directly on device, with explicit dtype, for a small gain.
Use torch.where and index_select (which are generally faster and more robust than boolean indexing when batches are empty or on CUDA).
Only one .append per Detections, and empty tensors created in a way that does not allocate new memory if there is nothing to keep.

This preserves original function signatures and input/output, and will run faster especially on GPU (but also on CPU).

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 54 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage

🌀 Generated Regression Tests Details

from typing import List

# imports
import pytest  # used for our unit tests
import torch
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# --- Minimal stubs for dependencies (for testability) ---

class Detections:
    def __init__(self, xyxy, confidence, class_ids):
        self.xyxy = xyxy
        self.confidence = confidence
        self.class_ids = class_ids

    def __eq__(self, other):
        # Compare tensors for equality (for test assertions)
        if not isinstance(other, Detections):
            return False
        return (
            torch.equal(self.xyxy, other.xyxy)
            and torch.equal(self.confidence, other.confidence)
            and torch.equal(self.class_ids, other.class_ids)
        )

class ObjectDetectionModel:
    pass

class PreProcessingMetadata:
    class Size:
        def __init__(self, height, width):
            self.height = height
            self.width = width

    def __init__(self, original_size):
        self.original_size = original_size

class PreProcessingConfig:
    pass

class LWDETR:
    pass

# --- PostProcess stub ---

class PostProcess:
    """
    Simulates a post-processing callable that takes model results and target sizes,
    and returns a list of dicts with 'scores', 'labels', 'boxes'.
    """
    def __call__(self, model_results, target_sizes):
        # For simplicity, assume model_results is a batch of size N, each with K objects
        # model_results: [N, K, 6] (x1, y1, x2, y2, score, label)
        # Return a list of dicts for each batch element
        results = []
        for i in range(model_results.shape[0]):
            batch = model_results[i]
            # Remove zero-padded rows (if any)
            keep = batch[:, 4] >= 0  # scores >= 0
            batch = batch[keep]
            if batch.shape[0] == 0:
                results.append({
                    "scores": torch.tensor([], dtype=torch.float32),
                    "labels": torch.tensor([], dtype=torch.int64),
                    "boxes": torch.empty((0, 4), dtype=torch.float32),
                })
                continue
            scores = batch[:, 4]
            labels = batch[:, 5].to(torch.int64)
            boxes = batch[:, :4]
            results.append({
                "scores": scores,
                "labels": labels,
                "boxes": boxes,
            })
        return results
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# --- Unit Tests ---

@pytest.fixture
def model():
    # Dummy model
    return LWDETR()

@pytest.fixture
def pre_processing_config():
    return PreProcessingConfig()

@pytest.fixture
def class_names():
    return ["cat", "dog", "person"]

@pytest.fixture
def device():
    return torch.device("cpu")

@pytest.fixture
def post_processor():
    return PostProcess()

@pytest.fixture
def model_instance(model, pre_processing_config, class_names, device, post_processor):
    return RFDetrForObjectDetectionTorch(
        model=model,
        pre_processing_config=pre_processing_config,
        class_names=class_names,
        device=device,
        post_processor=post_processor,
    )

# --- Basic Test Cases ---

def test_single_image_single_detection(model_instance):
    # One image, one detection above threshold
    model_results = torch.tensor([[[10.0, 20.0, 30.0, 40.0, 0.9, 1]]])  # [1,1,6]
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_single_image_multiple_detections(model_instance):
    # One image, multiple detections (some above, some below threshold)
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.6, 0],
        [15.0, 25.0, 35.0, 45.0, 0.4, 2],
        [50.0, 60.0, 70.0, 80.0, 0.8, 1],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_batch_multiple_images(model_instance):
    # Two images, different number of detections
    model_results = torch.tensor([
        [
            [10.0, 20.0, 30.0, 40.0, 0.7, 1],
            [0.0, 0.0, 0.0, 0.0, -1.0, 0],  # padded row, should be ignored
        ],
        [
            [50.0, 60.0, 70.0, 80.0, 0.95, 2],
            [12.0, 22.0, 32.0, 42.0, 0.3, 0],  # below threshold
        ]
    ])
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100, 200)),
        PreProcessingMetadata(PreProcessingMetadata.Size(150, 250)),
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

# --- Edge Test Cases ---

def test_no_detections(model_instance):
    # One image, all detections below threshold
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.1, 0],
        [15.0, 25.0, 35.0, 45.0, 0.2, 1],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_empty_batch(model_instance):
    # No images in batch
    model_results = torch.empty((0, 1, 6))
    pre_processing_meta = []
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_all_zero_padded(model_instance):
    # All rows are padded (scores < 0)
    model_results = torch.tensor([[
        [0.0, 0.0, 0.0, 0.0, -1.0, 0],
        [0.0, 0.0, 0.0, 0.0, -2.0, 1],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_threshold_edge(model_instance):
    # Detections exactly at threshold should be excluded (strict >)
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.5, 1],
        [50.0, 60.0, 70.0, 80.0, 0.5001, 2],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_large_class_id(model_instance):
    # Test with a class id outside the usual range
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.9, 999],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_negative_class_id(model_instance):
    # Test with a negative class id
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.9, -5],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_non_integer_class_id(model_instance):
    # Test with a non-integer class id (should be cast to int64)
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.9, 1.7],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_nan_and_inf_scores(model_instance):
    # Test with NaN and Inf scores
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, float('nan'), 1],
        [50.0, 60.0, 70.0, 80.0, float('inf'), 2],
        [90.0, 100.0, 110.0, 120.0, 0.9, 3],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output


def test_large_batch_many_detections(model_instance):
    # Test with a large batch and many detections per image
    batch_size = 50
    num_detections = 20
    # All scores above threshold
    model_results = torch.cat([
        torch.cat([
            torch.tensor([[
                float(i), float(j), float(i+10), float(j+10), 0.8, (i+j)%3
            ]]) for j in range(num_detections)
        ], dim=0).unsqueeze(0)
        for i in range(batch_size)
    ], dim=0)
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass

def test_large_batch_some_below_threshold(model_instance):
    # Large batch, some detections below threshold
    batch_size = 30
    num_detections = 30
    model_results = torch.zeros((batch_size, num_detections, 6))
    for i in range(batch_size):
        for j in range(num_detections):
            model_results[i, j, :4] = torch.tensor([float(i), float(j), float(i+5), float(j+5)])
            model_results[i, j, 4] = 0.4 if j % 2 == 0 else 0.9  # alternate below/above threshold
            model_results[i, j, 5] = (i+j) % 3
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass

def test_max_tensor_size_constraint(model_instance):
    # Ensure tensor stays under 100MB (float32: 4 bytes)
    # 100MB / 4 = 25,000,000 elements
    # For [batch, num_det, 6], pick batch=100, num_det=4000 => 100*4000*6 = 2,400,000 elements = 9.6MB
    batch_size = 100
    num_detections = 400
    model_results = torch.zeros((batch_size, num_detections, 6))
    for i in range(batch_size):
        for j in range(num_detections):
            model_results[i, j, :4] = torch.tensor([float(i), float(j), float(i+1), float(j+1)])
            model_results[i, j, 4] = 0.6  # all above threshold
            model_results[i, j, 5] = (i+j) % 3
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass

def test_large_batch_empty_detections(model_instance):
    # Large batch, all detections below threshold
    batch_size = 100
    num_detections = 10
    model_results = torch.zeros((batch_size, num_detections, 6))
    for i in range(batch_size):
        for j in range(num_detections):
            model_results[i, j, :4] = torch.tensor([float(i), float(j), float(i+1), float(j+1)])
            model_results[i, j, 4] = 0.1  # all below threshold
            model_results[i, j, 5] = (i+j) % 3
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from typing import Any, List

# imports
import pytest  # used for our unit tests
import torch
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# --- Minimal stub implementations for dependencies ---

class Detections:
    """
    Simple stub for Detections object.
    """
    def __init__(self, xyxy, confidence, class_ids):
        self.xyxy = xyxy
        self.confidence = confidence
        self.class_ids = class_ids

    def __eq__(self, other):
        # For test comparison, compare tensors elementwise
        if not isinstance(other, Detections):
            return False
        return (
            torch.equal(self.xyxy, other.xyxy) and
            torch.equal(self.confidence, other.confidence) and
            torch.equal(self.class_ids, other.class_ids)
        )

class OriginalSize:
    def __init__(self, height, width):
        self.height = height
        self.width = width

class PreProcessingMetadata:
    """
    Simple stub for PreProcessingMetadata.
    """
    def __init__(self, original_size):
        self.original_size = original_size

class PreProcessingConfig:
    pass

class LWDETR:
    pass

class PostProcess:
    """
    A dummy post processor that mimics returning a list of dicts with keys 'scores', 'labels', 'boxes'.
    """
    def __init__(self, return_fn=None):
        self.return_fn = return_fn

    def __call__(self, model_results, target_sizes):
        if self.return_fn:
            return self.return_fn(model_results, target_sizes)
        # Default: return empty detection for each input
        batch_size = model_results.shape[0]
        return [
            {
                "scores": torch.empty(0),
                "labels": torch.empty(0, dtype=torch.long),
                "boxes": torch.empty(0, 4)
            }
            for _ in range(batch_size)
        ]

class ObjectDetectionModel:
    pass
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# ----------- UNIT TESTS -----------

# --- BASIC TEST CASES ---

def test_post_process_single_detection_above_threshold():
    # Test: Single detection above threshold should be returned
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.9]),
            "labels": torch.tensor([1]),
            "boxes": torch.tensor([[10.0, 20.0, 30.0, 40.0]])
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 200))]
    model_results = torch.zeros(1, 1)  # shape doesn't matter for dummy
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_single_detection_below_threshold():
    # Test: Single detection below threshold should be filtered out
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.3]),
            "labels": torch.tensor([2]),
            "boxes": torch.tensor([[1.0, 2.0, 3.0, 4.0]])
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(50, 50))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_multiple_detections_mixed_threshold():
    # Test: Multiple detections, only those above threshold are returned
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.2, 0.6, 0.8]),
            "labels": torch.tensor([0, 1, 2]),
            "boxes": torch.tensor([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=torch.float)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_batch_multiple_images():
    # Test: Batch of two images, each with different number of detections
    def post_fn(model_results, target_sizes):
        return [
            {
                "scores": torch.tensor([0.7]),
                "labels": torch.tensor([0]),
                "boxes": torch.tensor([[1,2,3,4]], dtype=torch.float)
            },
            {
                "scores": torch.tensor([0.9, 0.2]),
                "labels": torch.tensor([2, 1]),
                "boxes": torch.tensor([[5,6,7,8],[9,10,11,12]], dtype=torch.float)
            }
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [
        PreProcessingMetadata(OriginalSize(10, 10)),
        PreProcessingMetadata(OriginalSize(20, 20))
    ]
    model_results = torch.zeros(2, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

# --- EDGE TEST CASES ---

def test_post_process_empty_batch():
    # Test: Empty batch should return empty list
    def post_fn(model_results, target_sizes):
        return []
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = []
    model_results = torch.zeros(0, 1)
    codeflash_output = model.post_process(model_results, meta); detections = codeflash_output

def test_post_process_empty_detections_in_result():
    # Test: Post process returns empty detections for all images
    def post_fn(model_results, target_sizes):
        return [
            {
                "scores": torch.empty(0),
                "labels": torch.empty(0, dtype=torch.long),
                "boxes": torch.empty(0, 4)
            }
            for _ in range(model_results.shape[0])
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10)) for _ in range(3)]
    model_results = torch.zeros(3, 1)
    codeflash_output = model.post_process(model_results, meta); detections = codeflash_output
    for det in detections:
        pass

def test_post_process_threshold_edge_cases():
    # Test: Detections exactly at threshold should NOT be kept (since >, not >=)
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.5, 0.5000001, 0.4999999]),
            "labels": torch.tensor([0,1,2]),
            "boxes": torch.tensor([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=torch.float)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_non_float_scores():
    # Test: If scores are integer tensor, should still work (shouldn't crash)
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([1, 0, 1], dtype=torch.int),
            "labels": torch.tensor([1, 2, 3]),
            "boxes": torch.tensor([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=torch.float)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c", "d"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10))]
    model_results = torch.zeros(1, 1)
    # threshold=0.5, so only scores > 0.5 (i.e., 1) are kept
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output


def test_post_process_large_batch_many_detections():
    # Test: Large batch size with many detections per image
    batch_size = 50
    num_detections = 20
    def post_fn(model_results, target_sizes):
        # Each image has num_detections, all with increasing scores from 0 to 1
        return [
            {
                "scores": torch.linspace(0, 1, num_detections),
                "labels": torch.arange(num_detections),
                "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
            }
            for _ in range(batch_size)
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_detections)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100)) for _ in range(batch_size)]
    model_results = torch.zeros(batch_size, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        # Only scores > 0.5 are kept
        expected_scores = torch.linspace(0, 1, num_detections)
        keep = expected_scores > 0.5

def test_post_process_large_number_of_classes():
    # Test: Detections with large number of classes
    num_classes = 500
    num_detections = 10
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.linspace(0, 1, num_detections),
            "labels": torch.randint(0, num_classes, (num_detections,)),
            "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_classes)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output
    # Only scores > 0.5
    expected_scores = torch.linspace(0, 1, num_detections)
    keep = expected_scores > 0.5

def test_post_process_large_boxes_tensor():
    # Test: Large number of detections, ensure memory is under 100MB
    num_detections = 1000  # Each detection: 4 floats, 4*4=16 bytes, 16*1000=16KB
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.ones(num_detections),
            "labels": torch.arange(num_detections),
            "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_detections)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_performance_within_reasonable_time():
    # Test: Large batch and detections, should run in reasonable time and not OOM
    batch_size = 10
    num_detections = 100
    def post_fn(model_results, target_sizes):
        return [
            {
                "scores": torch.ones(num_detections),
                "labels": torch.arange(num_detections),
                "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
            }
            for _ in range(batch_size)
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_detections)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100)) for _ in range(batch_size)]
    model_results = torch.zeros(batch_size, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1250-2025-05-14T12.42.25 and push.

…% in PR #1250 (`feature/inference-v1-models`) Here’s a faster version, focusing on eliminating Python loops in favor of vectorized/tensor operations, batching, minimizing temporary allocations, and avoiding repeated work. Function signatures and return values are **unchanged**. **Key speedups:** - `torch.tensor(orig_sizes, ...)` directly on device, with explicit dtype, for a small gain. - Use `torch.where` and `index_select` (which are generally faster and more robust than boolean indexing when batches are empty or on CUDA). - Only one `.append` per `Detections`, and empty tensors created in a way that does not allocate new memory if there is nothing to keep. This preserves original function signatures and input/output, and will run faster especially on GPU (but also on CPU).

codeflash-ai bot requested review from PawelPeczek-Roboflow and grzegorz-roboflow as code owners May 14, 2025 12:42

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 14, 2025

codeflash-ai bot requested review from yeldarby, probicheaux and hansent as code owners May 14, 2025 12:42

codeflash-ai bot mentioned this pull request May 14, 2025

Add first scratches of new interface #1250

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `RFDetrForObjectDetectionTorch.post_process` by 17% in PR #1250 (`feature/inference-v1-models`) #1276

⚡️ Speed up method `RFDetrForObjectDetectionTorch.post_process` by 17% in PR #1250 (`feature/inference-v1-models`) #1276

codeflash-ai bot commented May 14, 2025

Uh oh!

Uh oh!

⚡️ Speed up method RFDetrForObjectDetectionTorch.post_process by 17% in PR #1250 (feature/inference-v1-models) #1276

Are you sure you want to change the base?

⚡️ Speed up method RFDetrForObjectDetectionTorch.post_process by 17% in PR #1250 (feature/inference-v1-models) #1276

Conversation

codeflash-ai bot commented May 14, 2025

⚡️ This pull request contains optimizations for PR #1250

📄 17% (0.17x) speedup for RFDetrForObjectDetectionTorch.post_process in inference/v1/models/rfdetr/rfdetr_object_detection_pytorch.py

📝 Explanation and details

Uh oh!

Uh oh!

⚡️ Speed up method `RFDetrForObjectDetectionTorch.post_process` by 17% in PR #1250 (`feature/inference-v1-models`) #1276

⚡️ Speed up method `RFDetrForObjectDetectionTorch.post_process` by 17% in PR #1250 (`feature/inference-v1-models`) #1276

📄 17% (0.17x) speedup for `RFDetrForObjectDetectionTorch.post_process` in `inference/v1/models/rfdetr/rfdetr_object_detection_pytorch.py`