Skip to content
View AshwinSankar17's full-sized avatar

Block or report AshwinSankar17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The open source code for SimpleSpeech series

Python 89 6 Updated Aug 19, 2024

The reproduce training process for Moshi

Python 43 3 Updated Sep 20, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

52 Updated Sep 21, 2024

Efficient Triton Kernels for LLM Training

Python 2,974 152 Updated Sep 20, 2024
Python 3,646 262 Updated Sep 20, 2024

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

94 1 Updated Sep 18, 2024
Python 244 22 Updated Mar 15, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 1,690 83 Updated Sep 12, 2024

The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…

HTML 476 145 Updated Jul 1, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 661 38 Updated Sep 21, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,543 244 Updated Sep 14, 2024

Diffusion models in PyTorch

Python 83 3 Updated Sep 9, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 551 21 Updated Sep 17, 2024

Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.

Python 2,114 129 Updated Sep 20, 2024

The official implementation of EmoSphere-TTS

Python 59 6 Updated Aug 5, 2024

This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation"

Python 37 1 Updated Sep 17, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 82,325 22,142 Updated Sep 21, 2024

The official Implementation of PeriodWave and PeriodWave-Turbo

111 7 Updated Aug 19, 2024

Helpful tools and examples for working with flex-attention

Python 351 14 Updated Aug 17, 2024
Python 32 1 Updated Sep 1, 2024
Jupyter Notebook 4 Updated Jul 23, 2024

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,488 689 Updated Sep 21, 2024

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,135 196 Updated Sep 21, 2024

Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.

Svelte 34 7 Updated Sep 17, 2024

Official inference repo for FLUX.1 models

Python 13,880 984 Updated Sep 13, 2024

dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.

Python 473 76 Updated Jul 11, 2023

Python implementation of performance metrics in Loizou's Speech Enhancement book

Python 377 84 Updated Jun 12, 2024

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 32 2 Updated Aug 13, 2024
Next