Skip to content

XMUDM/DQABench

Repository files navigation

DQA-Bench

【English | 中文

DQA is the first comprehensive database question answering benchmark, whose dataset is constructed using Internet data collection and an innovative large language model generation-based method. We also propose a comprehensive LLM-based database Q&A testbed on DQA. This testbed is highly modular and scalable, with both basic and advanced components like Question Classification Routing (QCR), Retrieval-Augmented Generation (RAG), Tool Invocation Generation (TIG) and Prompt Template Engineering (PTE). Besides, DQA provides a complete evaluation pipeline, featuring diverse metrics and a standardized evaluation process to ensure comprehensiveness, accuracy, and fairness. We use DQA to evaluate the database Q&A capabilities under the proposed testbed comprehensively.


Contents

This repository contains the following contents:

  • Dataset of Benchmark DQA

    This section presents the DQA dataset, which is constructed by collecting Internet data and an innovative method based on large language model generation. The dataset contains more than 240,000 Chinese-English question-answer pairs, covering almost all aspects of database knowledge. The directory contains data examples and full dataset download links, divided into two sub-directories in Chinese and English, each containing three parts: General Knowledge, Specific Product, and Specific Instance.

  • Testbed Demo

    This section is a specific demonstration of the LLM database question answering testbed. The testbed is highly modular and extensible, with a variety of basic and advanced components, designed to support various LLMs to integrate with these components to handle actual database question answering scenarios. This directory contains the implementation, usage and download link of the question classification model (Question_Classification_Model), as well as the specific implementation and usage of database question classification answering (Testbed_Backbone).

  • Evaluation Code of Benchmark

    This section is the complete evaluation process of DQA. The evaluation process includes a variety of indicators and a standardized evaluation process to ensure the comprehensiveness, accuracy, and fairness of the evaluation. The evaluation process supports multiple mainstream large language models and can support more models and test indicators through simple extensions. This directory provides the specific implementation and usage of the evaluation process.

  • Popular LLMs Response for DQA

    This section shows the response of multiple popular large language models on DQA. By testing this response dataset, we can comprehensively evaluate the performance of different LLMs in database question answering tasks. This directory contains model response examples and download links for the complete response dataset. It is divided into two sub-directories in Chinese and English, each of which contains three parts: General Knowledge, Specific Product, and Specific Instance.

  • Experimental Results on DQA

    This section presents the experimental results on DQA, including the question classification results of different methods and the answer evaluation results of different models, revealing their advantages and disadvantages.

  • Additional Materials in Footnotes of Our Paper

    This section provides additional material to the footnotes in our paper, including additional data, methodological details, or other supplementary information, such as prompts for data collection and experiments used, aiming to provide readers with more comprehensive background and understanding.

Click to view details.

Citation

Feel free to cite us (paper link) and star us if you like this project.

@misc{zheng2024dqa,
      title={Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation}, 
      author={Yihang Zheng, Bo Li, Zhenghao Lin, Yi Luo, Xuanhe Zhou, Chen Lin, Jinsong Su, Guoliang Li, Shifu Li},
      year={2024},
      eprint={2409.04475},
      archivePrefix={arXiv},
      primaryClass={cs.DB}
}

About

DQA: a comprehensive database Q&A benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages