A Refer-and-Ground Multimodal Large Language Model for Biomedicine

If you like our project, please give us a star ⭐ on GitHub for latest update.

😮 Hightlights

The BiRD shows potential performance in bounding box grounding understanding in biomedical field.

✨ We constructe Med-GRIT-270k Dataset. Large-scale biomedical image-mask pairs are transformed into multi-modal conversations by leveraging chatGPT~\cite{OpenAI_2023} in a novel process. It is the first dataset in biomedicine to integrate referring, grounding, and conversations.
✨ The first Biomedical Refer-and-grounD Multimodal Large Language Model (BiRD). It is fine-tuned by multi-task instruction learning for the biomedical domain with self-generated data. This validates the effectiveness of multi-task instruction tuning and highlights best practices for adapting the MLLMs to the specialized domain.

🛠️Installation

You could refer to the official documentation of PaddleMIX to initialize the virtual environment.

🗃️Dataset

For the images downloading, please refer to the SAM-Med2D.

For the QA pairs, please fill the following form to get the Med-GRIT-270k dataset: Google Form. We will send the dataset to you by email after your application is approved.

📀Train

We perfrom this project on PaddleMIX framework. You can fine-tune the Qwen-VL with this command:

sh train.sh {GPU_ids} paddlemix/config/BiRD/sft_argument_stage2.json

You can also refer to the official documentation fine-tune other multimodal large model.

🥭 Test

Step 1: inference

Infer to generate the prediction jsonl file.

sh tests/models/BiRD/infer_all.sh

Step 2: calculate the metrics

Use the prediction jsonl file to calculate the metrics.

sh tests/models/BiRD/eval_all.sh

👍Acknowledgement

We thank the following excellent works: FERRET, PaddleMIX, and SAM-Med2D.

🪜Model Use

Intended Use

The data, code, and model checkpoints are intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper. The data, code, and model checkpoints are not intended to be used in clinical care or for any clinical decision making purposes.

Primary Intended Use

The primary intended use is to support AI researchers reproducing and building on top of this work. MedPLIB and its associated models should be helpful for exploring various biomedical pixel grunding and vision question answering (VQA) research questions.

Out-of-Scope Use

Any deployed use case of the model --- commercial or otherwise --- is out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are intended for research use only and not intended for deployed use cases.

🔒License

The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
The service is a research preview intended for non-commercial use only, subject to Terms of Use of the data generated by OpenAI, and Terms of Use of SAM-Med2D-20M. Please contact us if you find any potential violation.

✏️Citation

If you find our paper and code useful in your research, please consider giving a star and citation.

@article{huang2024refer,
  title={A Refer-and-Ground Multimodal Large Language Model for Biomedicine},
  author={Huang, Xiaoshuang and Huang, Haifeng and Shen, Lingdong and Yang, Yehui and Shang, Fangxin and Liu, Junwei and Liu, Jia},
  journal={arXiv preprint arXiv:2406.18146},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
applications		applications
assets		assets
deploy		deploy
docs		docs
paddlemix		paddlemix
ppdiffusers		ppdiffusers
scripts		scripts
tests		tests
LICENSE		LICENSE
README.md		README.md
README_paddlemix.md		README_paddlemix.md
VERSION		VERSION
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Refer-and-Ground Multimodal Large Language Model for Biomedicine

If you like our project, please give us a star ⭐ on GitHub for latest update.

😮 Hightlights

🛠️Installation

🗃️Dataset

📀Train

🥭 Test

Step 1: inference

Step 2: calculate the metrics

👍Acknowledgement

🪜Model Use

Intended Use

Primary Intended Use

Out-of-Scope Use

🔒License

✏️Citation

About

Releases

Packages

Languages

License

ShawnHuang497/BiRD

Folders and files

Latest commit

History

Repository files navigation

A Refer-and-Ground Multimodal Large Language Model for Biomedicine

If you like our project, please give us a star ⭐ on GitHub for latest update.

😮 Hightlights

🛠️Installation

🗃️Dataset

📀Train

🥭 Test

Step 1: inference

Step 2: calculate the metrics

👍Acknowledgement

🪜Model Use

Intended Use

Primary Intended Use

Out-of-Scope Use

🔒License

✏️Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages