DQA-Bench

【English | 中文】

DQA is the first comprehensive database question answering benchmark, whose dataset is constructed using Internet data collection and an innovative large language model generation-based method. We also propose a comprehensive LLM-based database Q&A testbed on DQA. This testbed is highly modular and scalable, with both basic and advanced components like Question Classification Routing (QCR), Retrieval-Augmented Generation (RAG), Tool Invocation Generation (TIG) and Prompt Template Engineering (PTE). Besides, DQA provides a complete evaluation pipeline, featuring diverse metrics and a standardized evaluation process to ensure comprehensiveness, accuracy, and fairness. We use DQA to evaluate the database Q&A capabilities under the proposed testbed comprehensively.

Dataset of Benchmark DQA

This section presents the DQA dataset, which is constructed by collecting Internet data and an innovative method based on large language model generation. The dataset contains more than 240,000 Chinese-English question-answer pairs, covering almost all aspects of database knowledge. The directory contains data examples and full dataset download links, divided into two sub-directories in Chinese and English, each containing three parts: General Knowledge, Specific Product, and Specific Instance.
Testbed Demo

This section is a specific demonstration of the LLM database question answering testbed. The testbed is highly modular and extensible, with a variety of basic and advanced components, designed to support various LLMs to integrate with these components to handle actual database question answering scenarios. This directory contains the implementation, usage and download link of the question classification model (Question_Classification_Model), as well as the specific implementation and usage of database question classification answering (Testbed_Backbone).
Evaluation Code of Benchmark

This section is the complete evaluation process of DQA. The evaluation process includes a variety of indicators and a standardized evaluation process to ensure the comprehensiveness, accuracy, and fairness of the evaluation. The evaluation process supports multiple mainstream large language models and can support more models and test indicators through simple extensions. This directory provides the specific implementation and usage of the evaluation process.
Popular LLMs Response for DQA

This section shows the response of multiple popular large language models on DQA. By testing this response dataset, we can comprehensively evaluate the performance of different LLMs in database question answering tasks. This directory contains model response examples and download links for the complete response dataset. It is divided into two sub-directories in Chinese and English, each of which contains three parts: General Knowledge, Specific Product, and Specific Instance.
Experimental Results on DQA

This section presents the experimental results on DQA, including the question classification results of different methods and the answer evaluation results of different models, revealing their advantages and disadvantages.
Additional Materials in Footnotes of Our Paper

This section provides additional material to the footnotes in our paper, including additional data, methodological details, or other supplementary information, such as prompts for data collection and experiments used, aiming to provide readers with more comprehensive background and understanding.

Click to view details.

Citation

Feel free to cite us (paper link) and star us if you like this project.

@misc{zheng2024dqa,
      title={Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation}, 
      author={Yihang Zheng, Bo Li, Zhenghao Lin, Yi Luo, Xuanhe Zhou, Chen Lin, Jinsong Su, Guoliang Li, Shifu Li},
      year={2024},
      eprint={2409.04475},
      archivePrefix={arXiv},
      primaryClass={cs.DB}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1_Dataset_of_Benchmark_DQA		1_Dataset_of_Benchmark_DQA
2_Testbed_Demo		2_Testbed_Demo
3_Evaluation_Code_of_Benchmark		3_Evaluation_Code_of_Benchmark
4_Popular_LLMs_Response_for_DQA		4_Popular_LLMs_Response_for_DQA
5_Experimental_Results_on_DQA		5_Experimental_Results_on_DQA
6_Additional_Materials_in_Footnotes_of_Our_Paper		6_Additional_Materials_in_Footnotes_of_Our_Paper
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DQA-Bench

Contents

Citation

About

Releases

Packages

Languages

XMUDM/DQABench

Folders and files

Latest commit

History

Repository files navigation

DQA-Bench

Contents

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages