Ancient_Books

Ancient_Books - 古籍解读大模型

开源不易，如果本项目帮到大家，可以右上角帮我点个 star~ ⭐

您的 star ⭐是我们最大的鼓励，欢迎 Star⭐、PR 和 Issue。

📖 目录

Ancient_Books - 古籍解读大模型

🔄 架构图

📝 简介

甘肃政法大学人工智能学院AI协会与知源书院推出的古籍解读大模型是一款辅助学习的工具，专为帮助用户理解和欣赏中国古代文学和文化而设计。它具备古诗赏析、文言文翻译、成语解释、《论语》注释以及《百家姓》解读等功能，使用户能够深入领会古代诗词、文献、成语典故和姓氏文化的精髓，是学术研究者、学生以及所有对中国古代文化感兴趣者的理想助手。

🛠️ 使用方法

快速开始

1.下载模型

参考模型的下载。

pip install modelscope

from modelscope import snapshot_download
model_dir = snapshot_download('CFYuan/Ancient_Books')

或者参考文件 download_model.py ，支持7B模型与7B int4 量化后的模型

python  download_model.py
python  download_hf.py

2.环境部署

git clone https://github.com/2001926342/Ancient_Books

pip install requirements.txt

3.本地部署

streamlit run web.py --server.port 7860

🧾 数据来源

以下是项目目前使用到的开源数据集，还使用爬虫技术获取我们所需数据集：

文言文：https://huggingface.co/datasets/RUCAIBox/Erya-dataset/tree/main

古诗：https://github.com/chinese-poetry/chinese-poetry

文言文（古文）- 现代文平行语料：https://github.com/NiuTrans/Classical-Modern

🧑‍💻 微调指南

本项目使用 xtuner 训练，在 internlm2-chat-7b 上进行微调

1、列出所有内置配置

xtuner list-cfg
cd /group_share/Ancient_Books/config
xtuner copy-cfg internlm2_chat_7b_qlora_oasst1_e3 .

2、模型下载

mkdir -p /group_share/Ancient_Books/model

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('Shanghai_AI_Laboratory/Ancient_Books', cache_dir='/group_share/Ancient_Books/model')

3、修改配置文件

# 修改模型为本地路径
- pretrained_model_name_or_path = 'internlm/internlm-chat-7b'
+ pretrained_model_name_or_path = '/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b'

# 修改训练数据集为本地路径
- data_path = 'timdettmers/openassistant-guanaco'
+ data_path = '/group_share/Ancient_Books/dataset/data/sampled_data.json'

4、开始微调

xtuner train /group_share/Ancient_Books/config/internlm2_chat_7b_qlora_oasst1_e3_copy.py

或者使用配置好的

xtuner train /group_share/Ancient_Books/config/internlm2_chat_7b_qlora_ancient_e3.py

5、PTH 模型转换为 HuggingFace 模型

mkdir /group_share/Ancient_Books/config/hf
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU
xtuner convert pth_to_hf ./internlm2_chat_7b_qlora_ancient_e3.py \
                         ./work_dirs/internlm2_chat_7b_qlora_ancient_e3/epoch_3.pth \
                         ./hf

6、HuggingFace 模型合并到大语言模型

# 原始模型参数存放的位置
export NAME_OR_PATH_TO_LLM=/group_share/Ancient_Books/model/Shanghai_AI_Laboratory/internlm2-math-7b
# Hugging Face格式参数存放的位置
export NAME_OR_PATH_TO_ADAPTER=/group_share/Ancient_Books/config/hf
# 最终Merge后的参数存放的位置
mkdir /group_share/Ancient_Books/config/work_dirs/hf_merge
export SAVE_PATH=/group_share/Ancient_Books/config/work_dirs/hf_merge

# 执行参数Merge
xtuner convert merge \
    $NAME_OR_PATH_TO_LLM \
    $NAME_OR_PATH_TO_ADAPTER \
    $SAVE_PATH \
    --max-shard-size 2GB

🧑‍💻 RAG指南

1、数据集构建

cd /group_share/Ancient_Books/dataset
python gen_dataset.py
python sample_dataset.py

cd /group_share/Ancient_Books/RAG
python create_db.py

2、Demo

python web_RAG.py

🧑‍💻LMDeploy 模型量化

1、进行 4bit 量化

lmdeploy lite auto_awq \
   /group_share/Ancient_Books/model/Ancient_Books \
  --calib-dataset 'ptb' \
  --calib-samples 128 \
  --calib-seqlen 1024 \
  --w-bits 4 \
  --w-group-size 128 \
  --work-dir /group_share/Ancient_Books/Ancient_Books_int4

2、基于 LMDeploy 高性能部署

lmdeploy chat /group_share/Ancient_Books/model/Ancient_Books_int4  --model-name internlm2

💕 致谢

项目成员

陈辅元-项目负责人（甘肃政法大学 Datawhale鲸英助教负责模型微调训练+数据收集+RAG内容整理+项目整理）
张世斌-项目负责人（甘肃政法大学）
柴承清（甘肃政法大学负责SDK编写+Agent编写（ing中）+模型微调）
李智江（甘肃政法大学）
符银霞（甘肃政法大学）

特别鸣谢

感谢上海人工智能实验室组织的书生·浦语实战营学习活动~

感谢 OpenXLab 对项目部署的算力支持~

感谢浦语小助手对项目的支持~

感谢上海人工智能实验室推出的书生·浦语大模型实战营，为我们的项目提供宝贵的技术指导和强大的算力支持！

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
RAG		RAG
assets		assets
config		config
dataset		dataset
.gitignore		.gitignore
README.md		README.md
download_model.py		download_model.py
interface.py		interface.py
packages.txt		packages.txt
requirements.txt		requirements.txt
web.py		web.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ancient_Books

Ancient_Books - 古籍解读大模型

📖 目录

🔄 架构图

📝 简介

🛠️ 使用方法

快速开始

🧾 数据来源

🧑‍💻 微调指南

🧑‍💻 RAG指南

🧑‍💻LMDeploy 模型量化

💕 致谢

项目成员

特别鸣谢

About

Releases

Packages

Languages

2001926342/Ancient_Books

Folders and files

Latest commit

History

Repository files navigation

Ancient_Books

Ancient_Books - 古籍解读大模型

📖 目录

🔄 架构图

📝 简介

🛠️ 使用方法

快速开始

🧾 数据来源

🧑‍💻 微调指南

🧑‍💻 RAG指南

🧑‍💻LMDeploy 模型量化

💕 致谢

项目成员

特别鸣谢

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages