-
Notifications
You must be signed in to change notification settings - Fork 16
Description
WIP but works for negative Sampling and crazy slow inference (see below): https://github.com/dice-group/dice-embeddings/tree/BET
Instead of learning one embedding per entity, BET encodes the raw bytes of the entity and relation names using a ByteEncoder which is a Transformer, with a vocab size for each possible byte and padding (256 + 1). (i rounded this to 260 causes maybe this is faster haha?)
These byte level embeddings replace the lookup tables in the KGE model.
The next level is a 3-token transformer just like in the CoKE model.
[ Head_emb , Relation_emb , MASK ]
-> the MASK computes the dot product with the tail embedding which is also generated by the ByteEncoder.
This allows us to have a fixed entity and relation embedding size as the Encoder generates them.
Current Limitation:
If we want k vs all scoring we would have to compute all tail embeddings using the encoder. I think this is O(|E| * seq_len²)?
solution for inference:
If we have a trained model and we want to do k_vs_all, we could precompute the tail embeddings once and reuse them for each k_vs_all forward pass.