Audio Source Separation with Demucs
Setup
I'll be using yt-dlp
to download audio from YouTube and I'll be using ffmpeg
for audio transcoding so make sure you've installed them to follow along. Check the docs for the guide.
You might also need to run this in a Virtual Env or in an Conda Environment.
# Note: `audiosep` is the env name
conda create -n audiosep python==3.10.11
conda activate audiosep
Install Demucs 4 using pip
python -m pip install demucs==4.0.0
Download the Music
We're using yt-dlp
to download a .wav
audio file from YouTube. Once it's finished, we'll have an audio file called music.wav
in the current directory.
Demucs expects the input audio to be in
.wav
format (PCM).
yt-dlp -f bestaudio/best --output music.wav "https://www.youtube.com/watch?v=jQaGTqR68xQ"
Separation
We're using the fine-tuned model called htdemucs_ft (Hybrid Transformer Demucs Fine-Tuned) which is a better version over a regular demucs model.
python -m demucs.separate \
-n htdemucs_ft \
--two-stems vocals \
--out output/ \
music.wav
--two-stems=vocals
is used because we only want to separate vocals and non vocals from the audio. Leave it empty if you want to separate more than 2. Demucs supports drums, bass, vocals, and other.
After the script finished, we should have an ./output/
folder which includes the separated audio files. Those files are in .wav
format so you might need to transcode it to .mp3
using ffmpeg
ffmpeg -y -i ./output/htdemucs_ft/music/no_vocals.wav no_vocals.mp3
ffmpeg -y -i ./output/htdemucs_ft/music/vocals.wav vocals.mp3
Preview
Original Audio
Vocals Only
Non vocals Only (Karaoke Version)
2023 © Seanghay Yath