AshwinSankar17

Ashwin Sankar AshwinSankar17

AI researcher | AI4Bharat | Working in Speech and Language | Interested in Multi-modal AI research.

30 followers · 54 following

AI4Bharat
Chennai
@_iunravel

Achievements

Starred repositories

yangdongchao / SimpleSpeech

The open source code for SimpleSpeech series

Python 89 6 Updated Aug 19, 2024

yangdongchao / Open-Training-Moshi

The reproduce training process for Moshi

Python 43 3 Updated Sep 20, 2024

haidog-yaqub / EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

52 Updated Sep 21, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 2,974 152 Updated Sep 20, 2024

kyutai-labs / moshi

Python 3,646 262 Updated Sep 20, 2024

yl4579 / StyleTTS-ZS

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

94 1 Updated Sep 18, 2024

PolyAI-LDN / pheme

Python 244 22 Updated Mar 15, 2024

nii-yamagishilab / mos-finetune-ssl

Python 73 18 Updated Jun 14, 2023

supertone-inc / super-monotonic-align

Python 112 8 Updated Sep 19, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 1,690 83 Updated Sep 12, 2024

microsoft / MS-SNSD

The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…

HTML 476 145 Updated Jul 1, 2024

jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 661 38 Updated Sep 21, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,543 244 Updated Sep 14, 2024

probabilists / azula

Diffusion models in PyTorch

Python 83 3 Updated Sep 9, 2024

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 551 21 Updated Sep 17, 2024

Lightning-AI / LitServe

Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.

Python 2,114 129 Updated Sep 20, 2024

Choddeok / EmoSphere-TTS

The official implementation of EmoSphere-TTS

Python 59 6 Updated Aug 5, 2024

Yuer867 / EMO-Disentanger

This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation"

Python 37 1 Updated Sep 17, 2024