-
AI4Bharat
- Chennai
- @_iunravel
Starred repositories
The open source code for SimpleSpeech series
The reproduce training process for Moshi
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Efficient Triton Kernels for LLM Training
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
The official implementation of EmoSphere-TTS
This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation"
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The official Implementation of PeriodWave and PeriodWave-Turbo
Helpful tools and examples for working with flex-attention
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Run PyTorch LLMs locally on servers, desktop and mobile
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
Official inference repo for FLUX.1 models
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
Python implementation of performance metrics in Loizou's Speech Enhancement book
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"