Skip to content

Conversation

@TimotheeMickus
Copy link
Collaborator

General PR for the HF/mammoth integration project, will also put some doc/pointers here

@TimotheeMickus
Copy link
Collaborator Author

utils for loading an existing checkpoint from mammoth:
https://github.com/Helsinki-NLP/mammoth/blob/main/onmt/model_builder.py#L179C9

all CLI/ config file options: listed in https://github.com/Helsinki-NLP/mammoth/blob/main/onmt/opts.py
(in particular, see train_opts for structural options, and build_bilingual_model )

encoder_layers=config_dict["enc_layers"][0],
encoder_ffn_dim=config_dict["transformer_ff"],
encoder_attention_heads=config_dict["heads"],
decoder_layers=config_dict["dec_layers"][0],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a sum, not a [0]

@TimotheeMickus
Copy link
Collaborator Author

indices for special tokens are here:
https://github.com/Helsinki-NLP/mammoth/blob/main/onmt/inputters/vocab.py#L10

@tkhnv
Copy link
Collaborator

tkhnv commented Aug 29, 2023

May this work for the SPM initialization https://github.com/google/sentencepiece/blob/master/python/add_new_vocab.ipynb ?

@amikael amikael mentioned this pull request Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants