The
demucs ML model
separates vocals, drums, bass and other tracks from music.
The source repo licence is the permissive MIT.
Tested on MacBook Pro 2020 (Intel x86) with macOS 15.5.
I used pyenv repo to install a suitable
Python version
>= 3.8
on macOS. The model uses PyTorch which does not support Python
3.13 as of June 14, 2025.
python3 --versionPython 3.12.3Create a virtual environment. I used Python's venv for this.
cd
python3 -m venv venv_demucs
source ./venv_demucs/bin/activate
python3 -m pip install --upgrade pipInstall
demucs PyPI package
in the virtual environment.
python3 -m pip install -U demucsOn macOS I hit a numpy version error triggered in
/torch/nn/modules/transformer.py. This error did not duplicate on Ubuntu
22.04.05 with numpy version 2.2.26.
demucsA module that was compiled using NumPy 1.x cannot be run in
NumPy 2.3.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.On macOS I downgraded numpy in the virtual environment to version 1.26.4.
python3 -m pip install "numpy<2"Run from command line in the virtual environment.
cd
source ./venv_demucs/bin/activate
demucs <path_to_music_file>I tested on a mono file with 1 track with 48kHz sampling rate. A random file in my downloads folder. The run took 4m13s.
(venv_demucs) $ demucs Downloads/fast_car_48k.wav
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /Users/guynicholson/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:01<00:00, 62.8MB/s]
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /Users/guynicholson/separated/htdemucs
Separating track Downloads/fast_car_48k.wav
100%|██████████████████████████████████████████████| 298.34999999999997/298.34999999999997 [04:13<00:00, 1.18seconds/s]
(venv_demucs) $An 80MB model file was downloaded to the cache in the above step.
After the run I found four generated WAV files in the separated/htdemucs
sub-folder. Each file has 44.1kHz sampling rate.

I was impressed on my first listen.
- All tracks have the expected separation. There are some quiet artifacts such as modulation and bleed-through from other tracks. Reverberation timbre can sound modulated.
- The
vocalstrack preserves the singing timbre and artist identity.
The --two-stems=vocals option allows separating vocals from the rest of the
accompaniment (i.e., "karaoke" mode). vocals can be changed to any source in the
selected model. Before running this I renamed the sub-folder generated in run1.
cd
source ./venv_demucs/bin/activate
demucs Downloads/fast_car_48k.wav --two-stems=vocalsThe run took 3m49s.
(venv_demucs) $ demucs Downloads/fast_car_48k.wav --two-stems=vocals
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /Users/guynicholson/separated/htdemucs
Separating track Downloads/fast_car_48k.wav
100%|██████████████████████████████████████████████| 298.34999999999997/298.34999999999997 [03:49<00:00, 1.30seconds/s]
(venv_demucs) $I was impressed on my first listen.
- All tracks have the expected separation.
- The
no_vocalstrack has a very quiet slightly ghostly sounding vocal. - The
vocalstrack sounded the same as in run1.
I ran these on MacBook Pro 2020 (Intel) with quad-core i5 and 16GB RAM. The
python3.12 process used all four CPU cores up to ~380% and memory usage
was in range [1, 1.3] GB.
Running the same separation task on workstation GPU (NVidia RTX A2000 Ampere) took just 13 seconds or >20x real-time. GPU memory usage for the Python 3.10 process was ~900MB.
- Try demucs on Ubuntu 22.04.5 LTS. It generates the same four WAV files.
- Try htdemucs_ft model.
- Dataset labelling using Python scripting on GPU.
- Can the model support streaming audio in chunks.
