Hi, thank you for your inspiring work!
While trying to reproduce the results on the MemGallery benchmark, I noticed that the provided adapter appears to use only textual information rather than the original images:
https://github.com/aiming-lab/SimpleMem/blob/main/OmniSimpleMem/benchmarks/memgallery/adapter.py#L144-L173
In particular, the _build_image_catalog function seems to construct the image catalog using image captions instead of the image content itself:
https://github.com/aiming-lab/SimpleMem/blob/main/OmniSimpleMem/benchmarks/memgallery/adapter.py#L202-L212
Could you please clarify whether the MemGallery results reported in the paper were obtained using a text-only evaluation protocol based on image captions?
If not, could you point me to the code or configuration used for evaluating the model with the original image modality?
Thank you!
Hi, thank you for your inspiring work!
While trying to reproduce the results on the MemGallery benchmark, I noticed that the provided adapter appears to use only textual information rather than the original images:
https://github.com/aiming-lab/SimpleMem/blob/main/OmniSimpleMem/benchmarks/memgallery/adapter.py#L144-L173
In particular, the _build_image_catalog function seems to construct the image catalog using image captions instead of the image content itself:
https://github.com/aiming-lab/SimpleMem/blob/main/OmniSimpleMem/benchmarks/memgallery/adapter.py#L202-L212
Could you please clarify whether the MemGallery results reported in the paper were obtained using a text-only evaluation protocol based on image captions?
If not, could you point me to the code or configuration used for evaluating the model with the original image modality?
Thank you!