Skip to content
Discussion options

You must be logged in to vote

It is not necessary to use the same train/val set that was used for training the individual experts. For example, in some corner cases, it may not be recommended if a model works really well for training but not on validation. Here, the MoE may learn to put a larger weight on that particular expert. In cases where the models consistently show similar skill in training vs validation, it does not matter.
So, feel free to use your particular mix of training and validation as long as they are independent and the training does not influence the validation in any way. I suggest checking the individual model performance in your train-val split. They should be consistent. For example, if expert 1…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ram-cherukuri
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants