This repository consists of an attempt to reimplement, reproduce and unify the various deep learning based methods for Music Source Separation.
This project was started as part of the requirement for the course Media Computing in Practice at the University of Tokyo, under the guidance of Yusuke Matsui sensei.
This is a work in progress, current results are decent but not as good as reported in the papers, please use with a pinch of salt. Will continue to try and improve the quality of separation.
Install the libsndfile and soundstretch libraries using your packagemanager, for example:
sudo apt-get install libsndfile1 soundstretch
If you use anaconda or miniconda, you can quickly create an environment using the provided environment yaml files.
For GPU machines:
conda env create --name <envname> --file=environment-cuda.yml
For CPU only machines:
conda env create --name <envname> --file=environment-cpu.yml
After creating the environment you can activate it as below:
conda activate <envname>
To do
Currently the D3Net vocals model has been uploaded to Huggingface and you can
run vocals-accompaniment separation
using that model with the separate.py
script. Invoke the separation as follows:
python separate.py \
-c configs/d3net/eval.yaml \
-i path/to/song.wav
Currently only .wav
files are supported on windows.
You can use the following command to convert .mp3
file to .wav
file within the conda environment created above:
ffmpeg -i song.mp3 song.wav
You can use .mp3
files directly on linux, without conversion.
If you would like to train the models yourself, please follow the following procedure
iSeparate currently supports the MUSDB18 dataset.
This dataset is in the Native Instruments STEMS format.
However, it is easier to deal with decoded .wav
files. To do that you can run the prepare_dataset.py
file.
If you would like to download a small 7s version of the dataset for testing the code, run
python prepare_dataset.py \
--root data/MUSDB18-sample \
--wav-root data/MUSDB18-sample-wav \
--filelists-dir filelists/musdb-sample \
--download-sample \
--keep-wav-only \
--make-symlink
If you would like to download the full dataset for training, run
python prepare_dataset.py \
--root data/MUSDB18 \
--wav-root data/MUSDB18-wav \
--filelists-dir filelists/musdb \
--keep-wav-only \
--make-symlink
The prepare_dataset.py
downloads the data in STEMS format to the directory specified by --root
and then extracts the
wav files into the directory specified by --wav-root
. If you want to delete the STEMS and keep only the wav files,
you can use the --keep-wav-only
option. The --make-symlink
option will create a symbolic link from the wav directory to the data/MUSDB18-wav
directory. If you wanted you could also edit the config files in configs
directory to point to the dataset directory.
Nvidia GPU's are required for training. These models require quite a lot of VRAM, you can change the batch_size
parameter in the configs to suit your needs.
Add the --debug
flag at the end if you just want to do a debug run (train on one batch and validation and then cleans up after itself)
To train on a single GPU:
python train.py --config-file configs/<method>/<config-name.yaml>
To train on multiple GPU with DistributedDataParallel
python -m torch.distributed.run \
--nproc_per_node=4 train.py \
--config-file configs/<method>/<config-name.yaml>
If you would like to add a new method and train on the MUSDB18 dataset, do the following steps
- create a model package:
models/awesome-method
- implement your model
- add the
separate.py
file and implement theload_models
andseparate
functions - add the model to
model_switcher.py
- create and/or add your custom loss functions to the
losses/loss_switcher.py
- create config files following the examples in
configs
directory