Main idea of BET (Byte Entity Transformer)


WIP but works for negative Sampling and crazy slow inference (see below): https://github.com/dice-group/dice-embeddings/tree/BET

Instead of learning one embedding per entity, BET encodes the raw bytes of the entity and relation names using a ByteEncoder which is a Transformer, with a vocab size for each possible byte and padding (256 + 1). (i rounded this to 260 causes maybe this is faster haha?)
These byte level embeddings replace the lookup tables in the KGE model. 

The next level is a 3-token transformer just like in the CoKE model. 
```
[ Head_emb , Relation_emb , MASK ]
```
-> the `MASK` computes the dot product with the tail embedding which is also generated by the ByteEncoder. 

This allows us to have a fixed entity and relation embedding size as the Encoder generates them. 

Current Limitation: 

If we want k vs all scoring we would have to compute all tail embeddings using the encoder. I think this is  `O(|E| * seq_len²)`? 

solution for inference:
If we have a trained model and we want to do k_vs_all,  we could precompute the tail embeddings once and reuse them for each k_vs_all forward pass.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Main idea of BET (Byte Entity Transformer) #350

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Main idea of BET (Byte Entity Transformer) #350

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions