Skip to content

Are cross-modal feature and cross-model representation vector same? #21

@DanyangCheng

Description

@DanyangCheng

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features $$r\in \mathbb{R} ^{1\times D}$$", but the formular below writes:" $$o_u=Concate(o_u^{i(f)},o_u^t)$$", Are they the same vector? and in this formular: $$PM(k,i)=\frac{1}{N_{k,i}^s}\sum_{j=0}^N r_j^{k,i}$$ what's the meaning of $$N_{k,i}^s$$ ? I didn't find these details in the source code.
It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature $$r_u=Concat(o_u^{i(f)},o^t_u)$$, and grouped them into $$N_l$$ sets{ $$R_k;0 \le k \le N_l$$ } according to the sample label, then applying K-Means on each $$R_k$$ which split $$R_k$$ into $$N^p$$ cluster. Finally, take the average of the vectors within the cluster as the prototype vector $$PM(k,i)$$ . Is this understanding correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions