Stars
Speech, Language, Audio, Music Processing with Large Language Model
AirLLM 70B inference with single 4GB GPU
Retrieval and Retrieval-augmented LLMs
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
the resources about the application based on LLM with RAG pattern
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Enhancing Translation with RAG-Powered Large Language Models
Whisper realtime streaming for long speech-to-text transcription and translation
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
✨✨Latest Advances on Multimodal Large Language Models
speaker adaptation, ASR, personality
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
An awesome spoken LID repository. (Working in progress
Faster distil-whisper transcription with CTranslate2
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Notebooks and various random fun
[ICASSP 2022] Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
open-source Mandarian biased word dataset
Some Conferences' accepted paper lists (including AI, ML, Robotic)