Stars
🧑🚀 全世界最好的LLM资料总结 | Summary of the world's best LLM resources.
Retrieval and Retrieval-augmented LLMs
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
前沿论文持续更新--视频时刻定位 or 时域语言定位 or 视频片段检索。
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
A java implementation of Bert Tokenizer.
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Chinese GPT2: pre-training and fine-tuning framework for text generation
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
DeepLab v3+ model in PyTorch. Support different backbones.
Code for "Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization"
Multiple implementations for abstractive text summurization , using google colab
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation