Skip to content

melspectrum007/audio-visual-speech-enhancement

Repository files navigation

Visual Speech Enhancement

Implementation of the method described in the paper: Visual Speech Enhancement by Aviv Gabbay, Asaph Shamir and Shmuel Peleg.

Speech Enhancement Demo

Usage

Dependencies

Getting started

Given an audio-visual dataset of the directory structure:

├── speaker-1
|   ├── audio
|   |   ├── f1.wav
|   |   └── f2.wav
|   └── video
|	├── f1.mp4
|	└── f2.mp4
├── speaker-2
|   ├── audio
|   |   ├── f1.wav
|   |   └── f2.wav
|   └── video
|	├── f1.mp4
|	└── f2.mp4
...

and noise directory contains audio files (*.wav) of noise samples, do the following steps.

Preprocess train, validation and test datasets separately by:

speech_enhancer.py --base_dir <output-dir-path> preprocess
    --data_name <preprocessed-data-name>
    --dataset_dir <dataset-dir-path>
    --noise_dirs <noise-dir-path> ...
    [--speakers <speaker-id> ...]
    [--ignored_speakers <speaker-id> ...] 

Then, train the model by:

speech_enhancer.py --base_dir <output-dir-path> train
    --model <model-name>
    --train_data_names <preprocessed-training-data-name> ...
    --validation_data_names <preprocessed-validation-data-name> ...
    [--gpus <num-of-gpus>]

Finally, enhance the test noisy speech samples by:

speech_enhancer.py --base_dir <output-dir-path> predict
    --model <model-name>
    --data_name <preprocessed-test-data-name>
    [--gpus <num-of-gpus>]

Citing

If you find this project useful for your research, please cite

@article{gabbay2017visual,
  title={Visual Speech Enhancement},
  author={Gabbay, Aviv and Shamir, Asaph and Peleg, Shmuel},
  journal={arXiv preprint arXiv:1711.08789},
  year={2017}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages