Stars
Robust Speech Recognition via Large-Scale Weak Supervision
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Machine Learning Engineering Open Book
A Collection of Variational Autoencoders (VAE) in PyTorch.
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
torch-optimizer -- collection of optimizers for Pytorch
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Audio generation using diffusion models, in PyTorch.
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
An unofficial styleguide and best practices summary for PyTorch
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Official implementation of "Implicit Neural Representations with Periodic Activation Functions"
Automatically create Faiss knn indices with the most optimal similarity search parameters.
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Collection of audio-focused loss functions in PyTorch