Skip to content

Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".

License

Notifications You must be signed in to change notification settings

Audio-WestlakeU/CleanMel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CleanMel

Paper Demos Try CleanMel GitHub Issues Contact

PyTorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR" [accepted by IEEE Trans. ASLPRO (TASLP), 2025].

Notice 📢

  • The CleanMel model checkpoints are now available on huggingface, the inference can be done using one-line commands.
  • All models are available in pretrained/enhancement/ folder.
  • The enhanced results from 4 offline_CleanMel_S/L_mask/map models for the CHIME example noisy_CHIME-real_F05_442C020S_STR_REAL are given in src/inference_example/pretrained_example_output folder.

Overview 🚀

jpg name

CleanMel enhances logMel spectrograms for improved speech quality and ASR performance. Outputs compatible with:

  • 🎙️ Vocoders for enhanced waveforms
  • 🤖 ASR systems for transcription

Demo Page 🎤

The demo page of CleanMel is published on Hugging Face Spaces.

If you downloaded the pretrained models (follwing instructions), you can also activate this demo page locally by running the following command:

python app.py

Then, open your browser and visit http://localhost:7860 to access the demo page.

Quick Start ⚡

Environment Setup

conda create -n CleanMel python=3.10.14
conda activate CleanMel
pip install -r requirements.txt

Inference

Pretrained models can be downloaded manually here, or automatically with the help of huggingface-hub package.

# Inference with pretrained models from huggingface
## Offline example (offline_CleanMel_S_mask)
cd shell
bash inference.sh 0, offline S mask huggingface

## Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map huggingface

# Inference with local pretrained models
cd shell
bash inference.sh 0, offline S mask

## Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map

Custom Input: Modify speech_folder in inference.sh

Output: Results saved to output_folder (default to ./my_output)

Training

# Offline training example (offline_CleanMel_S_mask)
cd shell
bash train.sh 0,1,2,3 offline S mask

Configure datasets in ./config/dataset/train.yaml

Default 4 GPUs trained with batch size 32

Pretrained Models 🧠

pretrained/
├── enhancement/
│   ├── offline_CleanMel_S_map.ckpt
│   ├── offline_CleanMel_S_mask.ckpt
│   ├── online_CleanMel_S_map.ckpt
|   └── ...
└── vocos/
    ├── vocos_offline.pt
    └── vocos_online.pt

Enhancement: offline_CleanMel_S/L_mask/map.ckpt are available.

Vocos: vocos_offline.pt and vocos_online.pt are here.

Performance 📊

Speech Enhancement

jpg name

jpg name

ASR Accuracy

png name

💡 ASR implementation details in asr_infer branch

Citation 📝

@ARTICLE{11097896,
  author={Shao, Nian and Zhou, Rui and Wang, Pengyu and Li, Xian and Fang, Ying and Yang, Yujie and Li, Xiaofei},
  journal={IEEE Transactions on Audio, Speech and Language Processing}, 
  title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR}, 
  year={2025},
  volume={33},
  number={},
  pages={3202-3214},
  doi={10.1109/TASLPRO.2025.3592333}}
}

Acknowledgement 🙏

  • Built using NBSS template
  • Vocoder implementation from Vocos

About

Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published