Skip to content

Add Instance Segmentation Transforms#836

Open
yutong-xiang-97 wants to merge 1 commit into
yutong-trn-2271-refactor-ltdetr-object-detection-transformsfrom
yutong-trn-2276-add-instance-segmentation-transforms
Open

Add Instance Segmentation Transforms#836
yutong-xiang-97 wants to merge 1 commit into
yutong-trn-2271-refactor-ltdetr-object-detection-transformsfrom
yutong-trn-2276-add-instance-segmentation-transforms

Conversation

@yutong-xiang-97

@yutong-xiang-97 yutong-xiang-97 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What has changed and why?

This PR stacks on top of #835.

It adds LT-DETR instance segmentation transform support on top of the object detection transform refactor:

  • add instance segmentation dataset mask handling
  • add LT-DETR instance segmentation transform wiring
  • add a reusable ltdetr_transforms instance segmentation transform module
  • add focused transform coverage for the new instance segmentation path

How has it been tested?

Unit tests.

Did you update CHANGELOG.md?

  • Yes
  • Not needed (internal change)

Did you update the documentation?

  • Yes
  • Not needed (internal change without effects for user)

@yutong-xiang-97 yutong-xiang-97 changed the title Add instance segmentation transforms Add Instance Segmentation Transforms Jul 3, 2026
@yutong-xiang-97 yutong-xiang-97 marked this pull request as ready for review July 3, 2026 16:18
@yutong-xiang-97

Copy link
Copy Markdown
Contributor Author

/review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7caf1ca53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +111 to +112
if isinstance(bboxes_out, list):
bboxes_out = np.array(bboxes_out)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve empty bbox tensors as (0, 4)

When an image has no instances (which the dataset allows by default) or an augmentation drops every box, albumentations can return bboxes as an empty list; np.array([]) produces shape (0,), so the dataset returns a box tensor with no second dimension. The downstream matchers/criteria expect each target's boxes to have shape [num_target_boxes, 4], so these samples can fail once an empty target is batched; reshape empty outputs to (0, 4) as the EoMT path does.

Useful? React with 👍 / 👎.

}


class LTDETRInstanceSegmentationCollateFunction(_LTDETRCollateFunction):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the LTDETR collate into dataset selection

This collate is never selected in the standard training flow: train_task.py instantiates collates only via train_dataset.batch_collate_fn_cls, while InstanceSegmentationDataset still points that class variable at EoMTInstanceSegmentationCollateFunction even when an LTDETR transform is installed. As a result, LTDETR instance-segmentation training through the normal dataset path silently skips the default mixup and this class's step-aware reinitialization logic.

Useful? React with 👍 / 👎.

Comment on lines +113 to +114
elif normalize == "none":
self.normalize = NormalizeArgs()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match default normalization stats to channel_drop

When users enable channel_drop with num_channels_keep other than 3 and leave normalize at the default auto, this branch still creates a 3-channel ImageNet NormalizeArgs() before num_channels is resolved from channel_drop. The sample pipeline then applies 3 mean/std values to a C!=3 image, which can fail or normalize the extra/dropped channels incorrectly; repeat/trim the stats or reject this combination as the EoMT transform compatibility pass does.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant