Skip to content

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

License

Notifications You must be signed in to change notification settings

atosystem/SpeechCLIP

Repository files navigation

audio-visual-ssl

Apply two types of vetor quantization - gumbel softmax and k-means (refer from fairseq.modules)

  • If you want to change the type of vector quantization, please modify the config yaml file under config/speechclip_c/train_flickr.yaml.
  • If you want to run only for validation or testing, add --eval or --test flag at egs/run_speechclip_c.sh
  • If you want to resume your training from specific checkpoint, add --ckpt your_checkpoint_path flag at egs/run_speechclip_c.sh
  • Please run autoformatter before opening PR! Autoformat audio-visual-ssl/dev-support/

To run cascaded speechclip, run

bash egs/run_speechclip_c.sh

Contribute

Please run autoformatter before opening PR! Autoformat audio-visual-ssl/dev-support/

About

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published