-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Hi, thank for this awesome work!
I wonder why you chose another pre trained ViT instead of using SEED in SEED-LLaMA. Is it because SEED, like SEED-text, doesn't align semantically with the original image well enough?
I quoted this sentence: As shown in Fig.2, compared with SEED[15], our visual de-tokenizer can decode images that are more semantically aligned with the original images by taking the ViT features as inputs.
If I am fortunate enough to receive a response, it will greatly facilitate my personal work based on SEED. Thank you very much for your answer~
Metadata
Metadata
Assignees
Labels
No labels