Why not use SEED tokenizer in SEED-LLaMA?

Hi, thank for this awesome work!

I wonder why you chose another pre trained ViT instead of using SEED in SEED-LLaMA. Is it because SEED, like SEED-text, doesn't align semantically with the original image well enough?
I quoted this sentence: As shown in Fig.2, compared with SEED[15], our visual de-tokenizer can decode images that are more semantically aligned with the original images by taking the ViT features as inputs.

If I am fortunate enough to receive a response, it will greatly facilitate my personal work based on SEED. Thank you very much for your answer~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why not use SEED tokenizer in SEED-LLaMA? #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why not use SEED tokenizer in SEED-LLaMA? #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions