Does each mini-batch include both multimodal understanding and image generation data? In this code, will the LLM loss turn NaN due to the absence of multimodal understanding data in a mini-batch?
What to do when a mini-batch only contains single modality data?
当单卡上的数据全为一种模态时(比如只有image generation 模态的数据),llm loss会变成nan,请问这个情况怎么处理?
这种不平衡的mini-batch data是否会影响模型的效果?