Skip to content

⚡️ Speed up method RFDetrForObjectDetectionTorch.post_process by 17% in PR #1250 (feature/inference-v1-models) #1276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: feature/inference-v1-models
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented May 14, 2025

⚡️ This pull request contains optimizations for PR #1250

If you approve this dependent PR, these changes will be merged into the original PR branch feature/inference-v1-models.

This PR will be automatically closed if the original PR is merged.


📄 17% (0.17x) speedup for RFDetrForObjectDetectionTorch.post_process in inference/v1/models/rfdetr/rfdetr_object_detection_pytorch.py

⏱️ Runtime : 25.0 milliseconds 21.3 milliseconds (best of 11 runs)

📝 Explanation and details

Here’s a faster version, focusing on eliminating Python loops in favor of vectorized/tensor operations, batching, minimizing temporary allocations, and avoiding repeated work. Function signatures and return values are unchanged.

Key speedups:

  • torch.tensor(orig_sizes, ...) directly on device, with explicit dtype, for a small gain.
  • Use torch.where and index_select (which are generally faster and more robust than boolean indexing when batches are empty or on CUDA).
  • Only one .append per Detections, and empty tensors created in a way that does not allocate new memory if there is nothing to keep.

This preserves original function signatures and input/output, and will run faster especially on GPU (but also on CPU).

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage
🌀 Generated Regression Tests Details
from typing import List

# imports
import pytest  # used for our unit tests
import torch
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# --- Minimal stubs for dependencies (for testability) ---

class Detections:
    def __init__(self, xyxy, confidence, class_ids):
        self.xyxy = xyxy
        self.confidence = confidence
        self.class_ids = class_ids

    def __eq__(self, other):
        # Compare tensors for equality (for test assertions)
        if not isinstance(other, Detections):
            return False
        return (
            torch.equal(self.xyxy, other.xyxy)
            and torch.equal(self.confidence, other.confidence)
            and torch.equal(self.class_ids, other.class_ids)
        )

class ObjectDetectionModel:
    pass

class PreProcessingMetadata:
    class Size:
        def __init__(self, height, width):
            self.height = height
            self.width = width

    def __init__(self, original_size):
        self.original_size = original_size

class PreProcessingConfig:
    pass

class LWDETR:
    pass

# --- PostProcess stub ---

class PostProcess:
    """
    Simulates a post-processing callable that takes model results and target sizes,
    and returns a list of dicts with 'scores', 'labels', 'boxes'.
    """
    def __call__(self, model_results, target_sizes):
        # For simplicity, assume model_results is a batch of size N, each with K objects
        # model_results: [N, K, 6] (x1, y1, x2, y2, score, label)
        # Return a list of dicts for each batch element
        results = []
        for i in range(model_results.shape[0]):
            batch = model_results[i]
            # Remove zero-padded rows (if any)
            keep = batch[:, 4] >= 0  # scores >= 0
            batch = batch[keep]
            if batch.shape[0] == 0:
                results.append({
                    "scores": torch.tensor([], dtype=torch.float32),
                    "labels": torch.tensor([], dtype=torch.int64),
                    "boxes": torch.empty((0, 4), dtype=torch.float32),
                })
                continue
            scores = batch[:, 4]
            labels = batch[:, 5].to(torch.int64)
            boxes = batch[:, :4]
            results.append({
                "scores": scores,
                "labels": labels,
                "boxes": boxes,
            })
        return results
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# --- Unit Tests ---

@pytest.fixture
def model():
    # Dummy model
    return LWDETR()

@pytest.fixture
def pre_processing_config():
    return PreProcessingConfig()

@pytest.fixture
def class_names():
    return ["cat", "dog", "person"]

@pytest.fixture
def device():
    return torch.device("cpu")

@pytest.fixture
def post_processor():
    return PostProcess()

@pytest.fixture
def model_instance(model, pre_processing_config, class_names, device, post_processor):
    return RFDetrForObjectDetectionTorch(
        model=model,
        pre_processing_config=pre_processing_config,
        class_names=class_names,
        device=device,
        post_processor=post_processor,
    )

# --- Basic Test Cases ---

def test_single_image_single_detection(model_instance):
    # One image, one detection above threshold
    model_results = torch.tensor([[[10.0, 20.0, 30.0, 40.0, 0.9, 1]]])  # [1,1,6]
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_single_image_multiple_detections(model_instance):
    # One image, multiple detections (some above, some below threshold)
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.6, 0],
        [15.0, 25.0, 35.0, 45.0, 0.4, 2],
        [50.0, 60.0, 70.0, 80.0, 0.8, 1],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_batch_multiple_images(model_instance):
    # Two images, different number of detections
    model_results = torch.tensor([
        [
            [10.0, 20.0, 30.0, 40.0, 0.7, 1],
            [0.0, 0.0, 0.0, 0.0, -1.0, 0],  # padded row, should be ignored
        ],
        [
            [50.0, 60.0, 70.0, 80.0, 0.95, 2],
            [12.0, 22.0, 32.0, 42.0, 0.3, 0],  # below threshold
        ]
    ])
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100, 200)),
        PreProcessingMetadata(PreProcessingMetadata.Size(150, 250)),
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

# --- Edge Test Cases ---

def test_no_detections(model_instance):
    # One image, all detections below threshold
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.1, 0],
        [15.0, 25.0, 35.0, 45.0, 0.2, 1],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_empty_batch(model_instance):
    # No images in batch
    model_results = torch.empty((0, 1, 6))
    pre_processing_meta = []
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_all_zero_padded(model_instance):
    # All rows are padded (scores < 0)
    model_results = torch.tensor([[
        [0.0, 0.0, 0.0, 0.0, -1.0, 0],
        [0.0, 0.0, 0.0, 0.0, -2.0, 1],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_threshold_edge(model_instance):
    # Detections exactly at threshold should be excluded (strict >)
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.5, 1],
        [50.0, 60.0, 70.0, 80.0, 0.5001, 2],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_large_class_id(model_instance):
    # Test with a class id outside the usual range
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.9, 999],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_negative_class_id(model_instance):
    # Test with a negative class id
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.9, -5],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_non_integer_class_id(model_instance):
    # Test with a non-integer class id (should be cast to int64)
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, 0.9, 1.7],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output

def test_nan_and_inf_scores(model_instance):
    # Test with NaN and Inf scores
    model_results = torch.tensor([[
        [10.0, 20.0, 30.0, 40.0, float('nan'), 1],
        [50.0, 60.0, 70.0, 80.0, float('inf'), 2],
        [90.0, 100.0, 110.0, 120.0, 0.9, 3],
    ]])
    pre_processing_meta = [PreProcessingMetadata(PreProcessingMetadata.Size(100, 200))]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output


def test_large_batch_many_detections(model_instance):
    # Test with a large batch and many detections per image
    batch_size = 50
    num_detections = 20
    # All scores above threshold
    model_results = torch.cat([
        torch.cat([
            torch.tensor([[
                float(i), float(j), float(i+10), float(j+10), 0.8, (i+j)%3
            ]]) for j in range(num_detections)
        ], dim=0).unsqueeze(0)
        for i in range(batch_size)
    ], dim=0)
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass

def test_large_batch_some_below_threshold(model_instance):
    # Large batch, some detections below threshold
    batch_size = 30
    num_detections = 30
    model_results = torch.zeros((batch_size, num_detections, 6))
    for i in range(batch_size):
        for j in range(num_detections):
            model_results[i, j, :4] = torch.tensor([float(i), float(j), float(i+5), float(j+5)])
            model_results[i, j, 4] = 0.4 if j % 2 == 0 else 0.9  # alternate below/above threshold
            model_results[i, j, 5] = (i+j) % 3
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass

def test_max_tensor_size_constraint(model_instance):
    # Ensure tensor stays under 100MB (float32: 4 bytes)
    # 100MB / 4 = 25,000,000 elements
    # For [batch, num_det, 6], pick batch=100, num_det=4000 => 100*4000*6 = 2,400,000 elements = 9.6MB
    batch_size = 100
    num_detections = 400
    model_results = torch.zeros((batch_size, num_detections, 6))
    for i in range(batch_size):
        for j in range(num_detections):
            model_results[i, j, :4] = torch.tensor([float(i), float(j), float(i+1), float(j+1)])
            model_results[i, j, 4] = 0.6  # all above threshold
            model_results[i, j, 5] = (i+j) % 3
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass

def test_large_batch_empty_detections(model_instance):
    # Large batch, all detections below threshold
    batch_size = 100
    num_detections = 10
    model_results = torch.zeros((batch_size, num_detections, 6))
    for i in range(batch_size):
        for j in range(num_detections):
            model_results[i, j, :4] = torch.tensor([float(i), float(j), float(i+1), float(j+1)])
            model_results[i, j, 4] = 0.1  # all below threshold
            model_results[i, j, 5] = (i+j) % 3
    pre_processing_meta = [
        PreProcessingMetadata(PreProcessingMetadata.Size(100+i, 200+i))
        for i in range(batch_size)
    ]
    codeflash_output = model_instance.post_process(model_results, pre_processing_meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from typing import Any, List

# imports
import pytest  # used for our unit tests
import torch
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# --- Minimal stub implementations for dependencies ---

class Detections:
    """
    Simple stub for Detections object.
    """
    def __init__(self, xyxy, confidence, class_ids):
        self.xyxy = xyxy
        self.confidence = confidence
        self.class_ids = class_ids

    def __eq__(self, other):
        # For test comparison, compare tensors elementwise
        if not isinstance(other, Detections):
            return False
        return (
            torch.equal(self.xyxy, other.xyxy) and
            torch.equal(self.confidence, other.confidence) and
            torch.equal(self.class_ids, other.class_ids)
        )

class OriginalSize:
    def __init__(self, height, width):
        self.height = height
        self.width = width

class PreProcessingMetadata:
    """
    Simple stub for PreProcessingMetadata.
    """
    def __init__(self, original_size):
        self.original_size = original_size

class PreProcessingConfig:
    pass

class LWDETR:
    pass

class PostProcess:
    """
    A dummy post processor that mimics returning a list of dicts with keys 'scores', 'labels', 'boxes'.
    """
    def __init__(self, return_fn=None):
        self.return_fn = return_fn

    def __call__(self, model_results, target_sizes):
        if self.return_fn:
            return self.return_fn(model_results, target_sizes)
        # Default: return empty detection for each input
        batch_size = model_results.shape[0]
        return [
            {
                "scores": torch.empty(0),
                "labels": torch.empty(0, dtype=torch.long),
                "boxes": torch.empty(0, 4)
            }
            for _ in range(batch_size)
        ]

class ObjectDetectionModel:
    pass
from inference.v1.models.rfdetr.rfdetr_object_detection_pytorch import \
    RFDetrForObjectDetectionTorch

# ----------- UNIT TESTS -----------

# --- BASIC TEST CASES ---

def test_post_process_single_detection_above_threshold():
    # Test: Single detection above threshold should be returned
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.9]),
            "labels": torch.tensor([1]),
            "boxes": torch.tensor([[10.0, 20.0, 30.0, 40.0]])
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 200))]
    model_results = torch.zeros(1, 1)  # shape doesn't matter for dummy
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_single_detection_below_threshold():
    # Test: Single detection below threshold should be filtered out
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.3]),
            "labels": torch.tensor([2]),
            "boxes": torch.tensor([[1.0, 2.0, 3.0, 4.0]])
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(50, 50))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_multiple_detections_mixed_threshold():
    # Test: Multiple detections, only those above threshold are returned
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.2, 0.6, 0.8]),
            "labels": torch.tensor([0, 1, 2]),
            "boxes": torch.tensor([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=torch.float)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_batch_multiple_images():
    # Test: Batch of two images, each with different number of detections
    def post_fn(model_results, target_sizes):
        return [
            {
                "scores": torch.tensor([0.7]),
                "labels": torch.tensor([0]),
                "boxes": torch.tensor([[1,2,3,4]], dtype=torch.float)
            },
            {
                "scores": torch.tensor([0.9, 0.2]),
                "labels": torch.tensor([2, 1]),
                "boxes": torch.tensor([[5,6,7,8],[9,10,11,12]], dtype=torch.float)
            }
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [
        PreProcessingMetadata(OriginalSize(10, 10)),
        PreProcessingMetadata(OriginalSize(20, 20))
    ]
    model_results = torch.zeros(2, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

# --- EDGE TEST CASES ---

def test_post_process_empty_batch():
    # Test: Empty batch should return empty list
    def post_fn(model_results, target_sizes):
        return []
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = []
    model_results = torch.zeros(0, 1)
    codeflash_output = model.post_process(model_results, meta); detections = codeflash_output

def test_post_process_empty_detections_in_result():
    # Test: Post process returns empty detections for all images
    def post_fn(model_results, target_sizes):
        return [
            {
                "scores": torch.empty(0),
                "labels": torch.empty(0, dtype=torch.long),
                "boxes": torch.empty(0, 4)
            }
            for _ in range(model_results.shape[0])
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10)) for _ in range(3)]
    model_results = torch.zeros(3, 1)
    codeflash_output = model.post_process(model_results, meta); detections = codeflash_output
    for det in detections:
        pass

def test_post_process_threshold_edge_cases():
    # Test: Detections exactly at threshold should NOT be kept (since >, not >=)
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([0.5, 0.5000001, 0.4999999]),
            "labels": torch.tensor([0,1,2]),
            "boxes": torch.tensor([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=torch.float)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_non_float_scores():
    # Test: If scores are integer tensor, should still work (shouldn't crash)
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.tensor([1, 0, 1], dtype=torch.int),
            "labels": torch.tensor([1, 2, 3]),
            "boxes": torch.tensor([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=torch.float)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=["a", "b", "c", "d"],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(10, 10))]
    model_results = torch.zeros(1, 1)
    # threshold=0.5, so only scores > 0.5 (i.e., 1) are kept
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output


def test_post_process_large_batch_many_detections():
    # Test: Large batch size with many detections per image
    batch_size = 50
    num_detections = 20
    def post_fn(model_results, target_sizes):
        # Each image has num_detections, all with increasing scores from 0 to 1
        return [
            {
                "scores": torch.linspace(0, 1, num_detections),
                "labels": torch.arange(num_detections),
                "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
            }
            for _ in range(batch_size)
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_detections)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100)) for _ in range(batch_size)]
    model_results = torch.zeros(batch_size, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        # Only scores > 0.5 are kept
        expected_scores = torch.linspace(0, 1, num_detections)
        keep = expected_scores > 0.5

def test_post_process_large_number_of_classes():
    # Test: Detections with large number of classes
    num_classes = 500
    num_detections = 10
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.linspace(0, 1, num_detections),
            "labels": torch.randint(0, num_classes, (num_detections,)),
            "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_classes)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output
    # Only scores > 0.5
    expected_scores = torch.linspace(0, 1, num_detections)
    keep = expected_scores > 0.5

def test_post_process_large_boxes_tensor():
    # Test: Large number of detections, ensure memory is under 100MB
    num_detections = 1000  # Each detection: 4 floats, 4*4=16 bytes, 16*1000=16KB
    def post_fn(model_results, target_sizes):
        return [{
            "scores": torch.ones(num_detections),
            "labels": torch.arange(num_detections),
            "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
        }]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_detections)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100))]
    model_results = torch.zeros(1, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output

def test_post_process_performance_within_reasonable_time():
    # Test: Large batch and detections, should run in reasonable time and not OOM
    batch_size = 10
    num_detections = 100
    def post_fn(model_results, target_sizes):
        return [
            {
                "scores": torch.ones(num_detections),
                "labels": torch.arange(num_detections),
                "boxes": torch.arange(num_detections*4, dtype=torch.float).reshape(num_detections, 4)
            }
            for _ in range(batch_size)
        ]
    model = RFDetrForObjectDetectionTorch(
        model=LWDETR(),
        pre_processing_config=PreProcessingConfig(),
        class_names=[str(i) for i in range(num_detections)],
        device=torch.device("cpu"),
        post_processor=PostProcess(return_fn=post_fn)
    )
    meta = [PreProcessingMetadata(OriginalSize(100, 100)) for _ in range(batch_size)]
    model_results = torch.zeros(batch_size, 1)
    codeflash_output = model.post_process(model_results, meta, threshold=0.5); detections = codeflash_output
    for det in detections:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1250-2025-05-14T12.42.25 and push.

Codeflash

…% in PR #1250 (`feature/inference-v1-models`)

Here’s a faster version, focusing on eliminating Python loops in favor of vectorized/tensor operations, batching, minimizing temporary allocations, and avoiding repeated work. Function signatures and return values are **unchanged**.



**Key speedups:**
- `torch.tensor(orig_sizes, ...)` directly on device, with explicit dtype, for a small gain.
- Use `torch.where` and `index_select` (which are generally faster and more robust than boolean indexing when batches are empty or on CUDA).
- Only one `.append` per `Detections`, and empty tensors created in a way that does not allocate new memory if there is nothing to keep.
  
This preserves original function signatures and input/output, and will run faster especially on GPU (but also on CPU).
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants