Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions examples/qwen3/conf/train/8b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
system:
no_shared_fs: ${experiment.runner.no_shared_fs}
num_workers: 8
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
context_parallel_size: 1
disable_bias_linear: true
reset_position_ids: True
reset_attention_mask: True
Comment on lines +8 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The boolean values are specified as True with an uppercase 'T'. While many YAML parsers accept this, the YAML specification and best practices favor lowercase true and false. For consistency with other boolean values in this file (e.g., disable_bias_linear: true) and to ensure compatibility across different environments, these should be changed to lowercase.

  reset_position_ids: true
  reset_attention_mask: true

qk_layernorm: true
sequence_parallel: false
use_distributed_optimizer: true
overlap_grad_reduce: true
overlap_param_gather: true
finetune: false
precision:
bf16: true
attention_softmax_in_fp32: true
accumulate_allreduce_grads_in_fp32: true
logging:
log_interval: 1
tensorboard_log_interval: 1
wandb_project: ${experiment.exp_name}
wandb_exp_name: ${experiment.exp_name}
log_timers_to_tensorboard: true
log_validation_ppl_to_tensorboard: true
log_throughput: true
log_params_norm: true
log_num_zeros_in_grad: true
log_memory_to_tensorboard: true
checkpoint:
save_interval: ${experiment.save_steps}
load: ${experiment.load}
ckpt_format: ${experiment.ckpt_format}

model:
transformer_impl: transformer_engine
num_layers: 36
hidden_size: 4096
ffn_hidden_size: 12288
kv_channels: 128
num_attention_heads: 32
num_query_groups: 8 # num_key_value_heads
seq_length: 1024
max_position_embeddings: 40960
norm_epsilon: 1e-6
use_rotary_position_embeddings: true
rotary_base: 1000000
swiglu: true
normalization: RMSNorm
init_method_std: 0.01
attention_dropout: 0.0
hidden_dropout: 0.0
clip_grad: 1.0
position_embedding_type: rope
untie_embeddings_and_output_weights: false
no_position_embedding: true
no_rope_fusion: true
use_cpu_initialization: true

# training
seed: ${experiment.seed}
# finetune: false
micro_batch_size: 1
global_batch_size: 4096
eval_iters: 0
train_iters: 50

optimizer:
weight_decay: 0.1
adam_beta1: 0.9
adam_beta2: 0.95
lr_scheduler:
lr: 1e-5
min_lr: 1e-6
lr_warmup_samples: 2048000
lr_decay_style: cosine


data:
data_path: /path/to/dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The data_path is set to a placeholder value /path/to/dataset. This will cause the training script to fail because it cannot locate the dataset. This path must be updated to a valid location of the training data before running the script.

split: 1
no_mmap_bin_files: true
tokenizer:
tokenizer_type: HuggingFaceTokenizer
tokenizer_path: examples/aquila/tokenizer_hf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The tokenizer_path is set to examples/aquila/tokenizer_hf, which is intended for an Aquila model and uses a GPT2Tokenizer. Using a tokenizer that does not match the qwen3-8b model will result in incorrect tokenization, leading to failed training or a poorly performing model. This path must be updated to point to the correct tokenizer for qwen3-8b.

vocab_size: 151936
make_vocab_size_divisible_by: 64
12 changes: 5 additions & 7 deletions hardware/Kunlunxin_R310p/FlagScale/diff.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
backends_commit:
vllm: 6d8d0a24c02bfd84d46b3016b865a44f048ae84b
backends_commit: {}
backends_version:
FlagScale: v0.8.0
vllm: 0.8.5
commit: 32704acc698b2565cd371365cf4c43549a9fb652
FlagScale: 0.9.0
commit: 3c5dbc409d6e222009bc07b772a141d6f5a81bca
contact: ''
device_type: Kunlunxin_R310p
models:
- qwen3 8b
- qwen3-8b-flagscale
task:
- inference
- train
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading