Stars
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.
Automatically create Faiss knn indices with the most optimal similarity search parameters.
A library for efficient similarity search and clustering of dense vectors.
AQUA-Tk = Audio QUality Assessment-Toolkit. (In development)
A browser extension that enhance search engines with ChatGPT
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Machine Learning Engineering Open Book
An Open Source text-to-speech system built by inverting Whisper.
neverix / musicgen_trainer
Forked from Sciumo/musicgen_trainersimple trainer for musicgen/audiocraft
Pitch Estimating Neural Networks (PENN)
A Python toolbox for speech features extraction
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Audio generation using diffusion models, in PyTorch.
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Robust Speech Recognition via Large-Scale Weak Supervision
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Implementation of DiffWave and SaShiMi audio generation models
Large, modern dataset for speech recognition
Collection of audio-focused loss functions in PyTorch
Structured state space sequence models
Performant and accurate speech recognition built on Pytorch
A library for speech data augmentation in time-domain
Library for Textless Spoken Language Processing