Stars
A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
《Deep Learning Tuning Playbook》中文翻译版本
The awesome agents in the era of large language models
本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Datasets for evaluating smart contract security analysis tools ( continuously updating... )
Consolidated Ground Truth (CGT) for Weaknesses of Ethereum Smart Contracts
SmartBugs: A Framework to Analyze Ethereum Smart Contracts
This repository contains 47,398 smart contracts extracted from the Ethereum network.
Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型
A Gradio web UI for Large Language Models.
This repository implements the chain of verification paper by Meta AI
fastapi langchain javascript, streaming response 手写效果流式响应
本项目对ChatGLM3-6B通过多种方式微调,使模型具备落地潜质(包括但不限于客服、聊天、游戏)
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
pretrained BERT model for cyber security text, learned CyberSecurity Knowledge
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/
《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程