Audio Source Separation with Demucs
I'll be using
yt-dlp to download audio from YouTube and I'll be using
ffmpeg for audio transcoding so make sure you've installed them to follow along. Check the docs for the guide.
You might also need to run this in a Virtual Env or in an Conda Environment.
# Note: `audiosep` is the env name
conda create -n audiosep python==3.10.11
conda activate audiosep
Install Demucs 4 using
python -m pip install demucs==4.0.0
yt-dlp to download a
.wav audio file from YouTube. Once it's finished, we'll have an audio file called
music.wav in the current directory.
Demucs expects the input audio to be in
yt-dlp -f bestaudio/best --output music.wav "https://www.youtube.com/watch?v=jQaGTqR68xQ"
We're using the fine-tuned model called htdemucs_ft (Hybrid Transformer Demucs Fine-Tuned) which is a better version over a regular demucs model.
python -m demucs.separate \
-n htdemucs_ft \
--two-stems vocals \
--out output/ \
--two-stems=vocalsis used because we only want to separate vocals and non vocals from the audio. Leave it empty if you want to separate more than 2. Demucs supports drums, bass, vocals, and other.
After the script finished, we should have an
./output/ folder which includes the separated audio files. Those files are in
.wav format so you might need to transcode it to
ffmpeg -y -i ./output/htdemucs_ft/music/no_vocals.wav no_vocals.mp3
ffmpeg -y -i ./output/htdemucs_ft/music/vocals.wav vocals.mp3
Non vocals Only (Karaoke Version)2023 © Seanghay Yath