Skip to content

Main idea of BET (Byte Entity Transformer) #350

@LckyLke

Description

@LckyLke

WIP but works for negative Sampling and crazy slow inference (see below): https://github.com/dice-group/dice-embeddings/tree/BET

Instead of learning one embedding per entity, BET encodes the raw bytes of the entity and relation names using a ByteEncoder which is a Transformer, with a vocab size for each possible byte and padding (256 + 1). (i rounded this to 260 causes maybe this is faster haha?)
These byte level embeddings replace the lookup tables in the KGE model.

The next level is a 3-token transformer just like in the CoKE model.

[ Head_emb , Relation_emb , MASK ]

-> the MASK computes the dot product with the tail embedding which is also generated by the ByteEncoder.

This allows us to have a fixed entity and relation embedding size as the Encoder generates them.

Current Limitation:

If we want k vs all scoring we would have to compute all tail embeddings using the encoder. I think this is O(|E| * seq_len²)?

solution for inference:
If we have a trained model and we want to do k_vs_all, we could precompute the tail embeddings once and reuse them for each k_vs_all forward pass.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions