Add configurable batch_size to CNN to fix GPU OOM on large directories by umerkhan95 · Pull Request #241 · idealo/imagededup

umerkhan95 · 2026-03-12T05:21:56Z

Fixes #232

The batch size was hardcoded at 64 with no way to lower it, which caused OOM on GPUs with limited memory when processing large image directories.

Changes:

CNN() now accepts a batch_size parameter (default 64, fully backwards compatible)
features are moved to cpu after each batch instead of accumulating on gpu
_collate_fn handles the case where all images in a batch fail to load
bad_im_count now counts individual bad images, not just batches
parallelise uses pool as a context manager to clean up workers properly
single image path uses shape[0] == 1 check instead of squeeze()

Usage:

from imagededup.methods import CNN
cnn = CNN(batch_size=8)  # lower batch size for limited gpu memory
duplicates = cnn.find_duplicates(image_dir='path/to/images/')

Tests:

All 251 existing + new tests pass across the suite. Added tests for custom batch sizes, batch size edge cases (1, larger than dataset), and all bad images directory.

…directories The batch size was hardcoded at 64 with no way to lower it, which caused OOM on GPUs with limited memory. This adds a batch_size parameter to CNN() so users can reduce it when needed. Also fixes a few related problems found along the way: - features are now moved to cpu per batch instead of accumulating on gpu - collate_fn no longer crashes when all images in a batch are unreadable - bad_im_count now counts actual bad images instead of batches - pool in parallelise uses a context manager to avoid leaked workers - single image encoding uses explicit shape check instead of squeeze()

umerkhan95 force-pushed the fix/cnn-gpu-oom branch from d063c2f to 34176b3 Compare March 12, 2026 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable batch_size to CNN to fix GPU OOM on large directories#241

Add configurable batch_size to CNN to fix GPU OOM on large directories#241
umerkhan95 wants to merge 1 commit intoidealo:devfrom
umerkhan95:fix/cnn-gpu-oom

umerkhan95 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

umerkhan95 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant