PyTorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR" [accepted by IEEE Trans. ASLPRO (TASLP), 2025].
- The CleanMel model checkpoints are now available on huggingface, the inference can be done using one-line commands.
- All models are available in
pretrained/enhancement/folder. - The enhanced results from 4
offline_CleanMel_S/L_mask/mapmodels for the CHIME examplenoisy_CHIME-real_F05_442C020S_STR_REALare given insrc/inference_example/pretrained_example_outputfolder.
CleanMel enhances logMel spectrograms for improved speech quality and ASR performance. Outputs compatible with:
- 🎙️ Vocoders for enhanced waveforms
- 🤖 ASR systems for transcription
The demo page of CleanMel is published on Hugging Face Spaces.
If you downloaded the pretrained models (follwing instructions), you can also activate this demo page locally by running the following command:
python app.pyThen, open your browser and visit http://localhost:7860 to access the demo page.
conda create -n CleanMel python=3.10.14
conda activate CleanMel
pip install -r requirements.txtPretrained models can be downloaded manually here, or automatically with the help of huggingface-hub package.
# Inference with pretrained models from huggingface
## Offline example (offline_CleanMel_S_mask)
cd shell
bash inference.sh 0, offline S mask huggingface
## Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map huggingface
# Inference with local pretrained models
cd shell
bash inference.sh 0, offline S mask
## Online example (online_CleanMel_S_map)
bash inference.sh 0, online S mapCustom Input: Modify speech_folder in inference.sh
Output: Results saved to output_folder (default to ./my_output)
# Offline training example (offline_CleanMel_S_mask)
cd shell
bash train.sh 0,1,2,3 offline S maskConfigure datasets in ./config/dataset/train.yaml
Default 4 GPUs trained with batch size 32
pretrained/
├── enhancement/
│ ├── offline_CleanMel_S_map.ckpt
│ ├── offline_CleanMel_S_mask.ckpt
│ ├── online_CleanMel_S_map.ckpt
| └── ...
└── vocos/
├── vocos_offline.pt
└── vocos_online.ptEnhancement: offline_CleanMel_S/L_mask/map.ckpt are available.
Vocos: vocos_offline.pt and vocos_online.pt are here.
💡 ASR implementation details in asr_infer branch
@ARTICLE{11097896,
author={Shao, Nian and Zhou, Rui and Wang, Pengyu and Li, Xian and Fang, Ying and Yang, Yujie and Li, Xiaofei},
journal={IEEE Transactions on Audio, Speech and Language Processing},
title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR},
year={2025},
volume={33},
number={},
pages={3202-3214},
doi={10.1109/TASLPRO.2025.3592333}}
}


