In Tequila, training from scratch or continuing training?

May I ask whether the results reported in the Tequila paper were obtained by training from scratch on 10B tokens, or by continuing training on 10B tokens based on pre-trained Llama3-1B and Llama3-3B parameters?