CleanMel

PyTorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR" [accepted by IEEE Trans. ASLPRO (TASLP), 2025].

Notice 📢

The CleanMel model checkpoints are now available on huggingface, the inference can be done using one-line commands.
All models are available in pretrained/enhancement/ folder.
The enhanced results from 4 offline_CleanMel_S/L_mask/map models for the CHIME example noisy_CHIME-real_F05_442C020S_STR_REAL are given in src/inference_example/pretrained_example_output folder.

Overview 🚀

CleanMel enhances logMel spectrograms for improved speech quality and ASR performance. Outputs compatible with:

🎙️ Vocoders for enhanced waveforms
🤖 ASR systems for transcription

Demo Page 🎤

The demo page of CleanMel is published on Hugging Face Spaces.

If you downloaded the pretrained models (follwing instructions), you can also activate this demo page locally by running the following command:

python app.py

Then, open your browser and visit http://localhost:7860 to access the demo page.

Quick Start ⚡

Environment Setup

conda create -n CleanMel python=3.10.14
conda activate CleanMel
pip install -r requirements.txt

Inference

Pretrained models can be downloaded manually here, or automatically with the help of huggingface-hub package.

# Inference with pretrained models from huggingface
## Offline example (offline_CleanMel_S_mask)
cd shell
bash inference.sh 0, offline S mask huggingface

## Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map huggingface

# Inference with local pretrained models
cd shell
bash inference.sh 0, offline S mask

## Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map

Custom Input: Modify speech_folder in inference.sh

Output: Results saved to output_folder (default to ./my_output)

Training

# Offline training example (offline_CleanMel_S_mask)
cd shell
bash train.sh 0,1,2,3 offline S mask

Configure datasets in ./config/dataset/train.yaml

Default 4 GPUs trained with batch size 32

Pretrained Models 🧠

pretrained/
├── enhancement/
│   ├── offline_CleanMel_S_map.ckpt
│   ├── offline_CleanMel_S_mask.ckpt
│   ├── online_CleanMel_S_map.ckpt
|   └── ...
└── vocos/
    ├── vocos_offline.pt
    └── vocos_online.pt

Enhancement: offline_CleanMel_S/L_mask/map.ckpt are available.

Vocos: vocos_offline.pt and vocos_online.pt are here.

Performance 📊

Speech Enhancement

ASR Accuracy

💡 ASR implementation details in asr_infer branch

Citation 📝

@ARTICLE{11097896,
  author={Shao, Nian and Zhou, Rui and Wang, Pengyu and Li, Xian and Fang, Ying and Yang, Yujie and Li, Xiaofei},
  journal={IEEE Transactions on Audio, Speech and Language Processing}, 
  title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR}, 
  year={2025},
  volume={33},
  number={},
  pages={3202-3214},
  doi={10.1109/TASLPRO.2025.3592333}}
}

Acknowledgement 🙏

Built using NBSS template
Vocoder implementation from Vocos

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
configs		configs
data_loader		data_loader
model		model
pretrained		pretrained
shell		shell
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CleanMel

Notice 📢

Overview 🚀

Demo Page 🎤

Quick Start ⚡

Environment Setup

Inference

Training

Pretrained Models 🧠

Performance 📊

Speech Enhancement

ASR Accuracy

Citation 📝

Acknowledgement 🙏

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Audio-WestlakeU/CleanMel

Folders and files

Latest commit

History

Repository files navigation

CleanMel

Notice 📢

Overview 🚀

Demo Page 🎤

Quick Start ⚡

Environment Setup

Inference

Training

Pretrained Models 🧠

Performance 📊

Speech Enhancement

ASR Accuracy

Citation 📝

Acknowledgement 🙏

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages