This repo is self-learning project that the music and its vocal converted to relative spectograms, then using these spectograms, the vocal seperation AI is trained.
Music | Vocal | AI Output |
---|---|---|
Music Listen | Vocal Listen | AI Output Listen |
Music Listen | Vocal Listen | AI Output Listen |
- I used DSD100 dataset to get music and vocal.
- The musics and vocals is splitted 5 sec parts and converted to spectograms. The dataset I created with spectograms
- I used provided repo to convert from music to spectogram and spectogram to music
- I used pix2pix-tensorflow implementation to train the model
1 - The musics and vocals are splitted to 5 sec music parts.
2 - The 5 sec parts are converted to spectogram images. Changed Values to get 255x256 images :
- Pixels per second : 51
- Bandwitdh : 205
3 - 1 pixel height is added end of the height( Don't put start ) to get image size 256x256 images.
4 - The parts that do not contain vocals is removed from dataset via removing the images has only 0 pixel values.
- I trained 10 epochs in pix2pix implementation.
I will update this part
- pix2pixHD.
- transparency problem of spectograms.