Replies: 6 comments 15 replies
-
|
Yes, adding a small amount of inner dilation (i.e., letting the mask slightly overlap the actual object boundary) is almost always beneficial when training inpainting models — especially for video/object-aware inpainting — and your intuition is spot on. Here’s why it helps and some practical considerations: Why a small inner overlap is good
How much inner dilation is ideal?From both academic papers (e.g., training Stable Diffusion inpainting, FlowCam, E2FGVI, etc.) and practical experience in the community:
Since your masks are already parametric (you control divergence/convergence + dilation), you can easily create a mixed strategy: Recommended training strategy
Kubric-specific tipKubric gives you perfect instance segmentation, so you can very precisely control this overlap. I would actually generate multiple mask variants per video:
SummaryYes — definitely include a small amount of inner dilation (≈4 pixels is perfect at 512×512). It is one of the highest-impact tricks for getting clean, robust inpainting boundaries, especially in video where mask jitter is inevitable. Your setup with Kubric + parametric edge masks + real-world data is excellent. With this small inner overlap trick (plus the randomization ideas above), you should see noticeably sharper and more stable results than most published video inpainting models. Good luck — this sounds like a very promising training run! |
Beta Was this translation helpful? Give feedback.
-
|
I found this interesting: Random jitter augmentation for masks means: The goal is to make the inpainting model robust to imperfect masks at inference time (which is basically always the case in the real world). What kinds of jitter people usually applyAt every training iteration (or per-batch), randomly apply one or more of the following to the mask:
Very common and effective recipe (used in almost every strong video inpainting paper)def jitter_mask(mask):
# 1. Random dilation/erosion (±8 pixels is common)
r = random.randint(-8, +12)
if r > 0:
mask = binary_dilation(mask, iterations=r)
elif r < 0:
mask = binary_erosion(mask, iterations=-r)
# 2. Random shift (±6 pixels)
dx = random.randint(-6, 6)
dy = random.randint(-6, 6)
mask = shift(mask, [dy, dx]) # or roll
# 3. Tiny bit of Gaussian blur on the mask (optional but nice)
if random.random() < 0.5:
mask = gaussian_filter(mask.astype(float), sigma=random.uniform(0.5, 2.0)) > 0.5
return maskWhy this helps so much
Practical numbers that work very well
Result you’ll see: So yes — add random jitter augmentation. It’s one of the highest-return tricks in the entire training recipe. |
Beta Was this translation helpful? Give feedback.
-
|
We already use random inner and outer dilation during training. nunif/iw3/training/inpaint/dataset_video.py Lines 21 to 42 in a73146d nunif/iw3/training/inpaint/dataset_video.py Lines 184 to 187 in a73146d There is no option for it, but if needed, options to disable it or control its strength can be added to trainer.py. |
Beta Was this translation helpful? Give feedback.
-
|
Here are some features I am testing, with brief explanations:
|
Beta Was this translation helpful? Give feedback.
-
|
Dear Experts, |
Beta Was this translation helpful? Give feedback.
-
|
I posted i while ago the same issue. Tried a lot with different depthmap model's, resolutions, etc. all with the same results (bigger or smaller depend on depthmap). Edge dilatation makes it bigger. It always happens in the first lines under/above black bars. But also on a larger " black bar" i a scene. |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
First of all, thank you for making iw3! My question is about inpaint model training. I created a custom script for generating masks so I could use DepthCrafter maps. The script allows me to adjust the divergence, convergence, and both inner and outer dilation for the created masks. Do you think it would benefit the model training to have a small amount of inner dilation to overlap the objects outer edge?
I created thousands of short 512x512 videos using the Kubric Movi-E dataset generator. Hopefully using both synthetic and real-world video samples will improve the model!
Here is a visualization of the mask slightly overlapping the objects.

Beta Was this translation helpful? Give feedback.
All reactions