Skip to content

Conversation

@csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented May 30, 2025

See also #1942

This PR shows how to use #1942 with the Librispeech dataset.

It uses on-the-fly feature computation since we do the augmentation at the sample level, features must be computed after the augmentation.


Usage

With augmentation Without agumentation
./zipformer/train_with_aug.py \
  --num-workers 10 \
  --world-size 1 \
  --num-epochs 30 \
  --start-epoch 1 \
  --use-fp16 1 \
  --enable-augmentation 1 \
  --exp-dir zipformer/exp-with-aug \
  --use-ctc 0 \
  --use-transducer 1 \
  --use-attention-decoder 0 \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192 \
  --ctc-loss-scale 0.1 \
  --full-libri 1 \
  --max-duration 500
./zipformer/train_with_aug.py \
  --num-workers 10 \
  --world-size 1 \
  --num-epochs 30 \
  --start-epoch 1 \
  --use-fp16 1 \
  --enable-augmentation 0 \
  --exp-dir zipformer/exp-no-aug \
  --use-ctc 0 \
  --use-transducer 1 \
  --use-attention-decoder 0 \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192 \
  --ctc-loss-scale 0.1 \
  --full-libri 1 \
  --max-duration 2800
Screenshot 2025-05-30 at 16 37 45
(py38) kuangfangjun:greedy_search$ pwd
/star-fj/fangjun/open-source/icefall-aug/egs/librispeech/ASR/zipformer/exp-with-aug/greedy_search
(py38) kuangfangjun:greedy_search$ grep 'best for test' log-* | sort -k2 -n | head -n10
log-decode-iter-52000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-48-35:greedy_search  3.87    best for test-clean
log-decode-iter-52000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-45-59:greedy_search  3.94    best for test-clean
log-decode-iter-52000_avg-3_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-50-37:greedy_search  3.94    best for test-clean
log-decode-iter-48000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-57-56:greedy_search  4.04    best for test-clean
log-decode-iter-48000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-56-05:greedy_search  4.05    best for test-clean
log-decode-iter-52000_avg-4_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-52-26:greedy_search  4.1     best for test-clean
log-decode-iter-48000_avg-3_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-59-43:greedy_search  4.2     best for test-clean
log-decode-iter-44000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-15-05-13:greedy_search  4.24    best for test-clean
log-decode-iter-44000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-15-06-59:greedy_search  4.24    best for test-clean
log-decode-iter-48000_avg-4_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-15-01-30:greedy_search  4.24    best for test-clean
(py38) kuangfangjun:greedy_search$ pwd
/star-fj/fangjun/open-source/icefall-aug/egs/librispeech/ASR/zipformer/exp-no-aug/greedy_search
(py38) kuangfangjun:greedy_search$ grep 'best for test' log-* | sort -k2 -n | head -n20
log-decode-iter-40000_avg-3_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-44-01:greedy_search  2.78    best for test-clean
log-decode-iter-40000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-42-12:greedy_search  2.8     best for test-clean
log-decode-iter-40000_avg-4_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-45-51:greedy_search  2.84    best for test-clean
log-decode-iter-36000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-34-35:greedy_search  2.86    best for test-clean
log-decode-iter-40000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-40-11:greedy_search  2.86    best for test-clean
log-decode-iter-36000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-32-34:greedy_search  2.9     best for test-clean
log-decode-iter-36000_avg-3_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-36-24:greedy_search  2.9     best for test-clean
log-decode-iter-32000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-39-13:greedy_search  2.91    best for test-clean
log-decode-iter-32000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-41-33:greedy_search  2.93    best for test-clean
log-decode-iter-36000_avg-4_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-38-18:greedy_search  2.97    best for test-clean
log-decode-iter-32000_avg-3_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-44-06:greedy_search  3.03    best for test-clean
log-decode-iter-32000_avg-4_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-46-38:greedy_search  3.14    best for test-clean
log-decode-iter-32000_avg-5_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-49-09:greedy_search  3.39    best for test-clean
log-decode-iter-16000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-11-03-15:greedy_search  3.84    best for test-clean
log-decode-iter-16000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-11-05-04:greedy_search  4.25    best for test-clean
log-decode-iter-12000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-59-02:greedy_search  4.46    best for test-clean
log-decode-iter-8000_avg-1_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-10-55-24:greedy_search   5.33    best for test-clean
log-decode-iter-12000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-11-00-51:greedy_search  5.37    best for test-clean
log-decode-iter-40000_avg-2_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-42-12:greedy_search  7.29    best for test-other
log-decode-iter-40000_avg-3_context-2_max-sym-per-frame-1_use-averaged-model-2025-05-30-14-44-01:greedy_search  7.37    best for test-other

Note since they use a different --max-duration, we cannot compare the WERs directly at the same iteration.


(py38) kuangfangjun:exp-no-aug$ pwd
/star-fj/fangjun/open-source/icefall-aug/egs/librispeech/ASR/zipformer/exp-no-aug
(py38) kuangfangjun:exp-no-aug$ ls -lhrt checkpoint-*
-rw-r--r-- 1 kuangfangjun root 2.3G May 29 21:00 checkpoint-4000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 29 22:49 checkpoint-8000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 00:38 checkpoint-12000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 02:26 checkpoint-16000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 04:14 checkpoint-20000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 06:02 checkpoint-24000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 07:50 checkpoint-28000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 09:37 checkpoint-32000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 11:24 checkpoint-36000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 13:13 checkpoint-40000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 15:02 checkpoint-44000.pt
(py38) kuangfangjun:exp-no-aug$ cd ../exp-with-aug/
(py38) kuangfangjun:exp-with-aug$ ls -lhrt checkpoint-*
-rw-r--r-- 1 kuangfangjun root 2.3G May 29 19:58 checkpoint-4000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 29 21:29 checkpoint-8000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 29 22:57 checkpoint-12000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 00:25 checkpoint-16000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 01:51 checkpoint-20000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 03:17 checkpoint-24000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 04:44 checkpoint-28000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 06:10 checkpoint-32000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 07:37 checkpoint-36000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 09:04 checkpoint-40000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 10:30 checkpoint-44000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 11:56 checkpoint-48000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 13:23 checkpoint-52000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 14:50 checkpoint-56000.pt
-rw-r--r-- 1 kuangfangjun root 2.3G May 30 16:17 checkpoint-60000.pt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant