Skip to content

MarkusR1805/blip2-image-captioning

Repository files navigation

Blip2-Image-Captioning

For Mac (M1, M2, M3) Windows / Linux (CUDA), or CPU

Picture1

Supported Formats

JPG, JPEG, PNG, BMP, GIF

Installation

Create a virtual Python environment in the same directory!

Clone the reposity

git clone https://github.com/MarkusR1805/blip2-image-captioning.git

Open the terminal in the directory, e.g. /path/captioning/Blip2-Image-Captioning

python3.12 -m venv env
source env/bin/activate

Install requirements.txt

pip install --upgrade pip
pip install -r requirements.txt

or

pip install language-tool-python
pip install nltk
pip install pillow
pip install psutil
pip install torch
pip install transformers

Picture2

Use Model

Update the programme regularly

git pull

Start the programm in terminal with

python main.py

Salesforce blip2-opt-2.7b with ≈ 3.744.679.936 params

Approximately 15 GB in size. Either you use the programme as it is set (from Huggingface), or you load the model locally on your computer and have to change the path to "main.py". You must then adapt these lines of code!

#model_path = "/Volumes/SSD T7/Salesforce-blip2-opt-27b" # Local path

or model_path = "Salesforce/blip2-opt-2.7b" # Huggingface path

Usage

Attention! All text files in the directory will be deleted without being asked! All files with the suffix .txt!!!

You must be in the programme directory in the terminal, then start the programme with "python main.py"

This application is used via the terminal, here I show it using the example of a MacBook M3

First question: The path to the directory in which the images are located Second question: The path to ignore_list.txt (leave empty if no explicit file exists) Default is the programme path Third question: The path to allowed_list.txt (leave empty if no explicit file exists) Default is the programme path Fourth question: Additional keywords 2-3 or more at the very beginning of the image description (enter separated by a comma)

The programme creates text files with the same name as the image, example image1.png = image1.txt First, only image descriptions are created for all images, then keywords are filtered from the image description and placed in front of the image description in addition to the keywords you entered at the beginning. The following files are also created:

  1. gesamt.txt / All image descriptions in one file, ideal for use as a wildcard
  2. extracted_words.txt / all keywords of all images can be found here
  3. t_extracted_words.txt / as in 2 but with the tokens added
  4. a CSV table with image description and image path

Picture3

You can adapt and change the python code and also change the parameters of the model. Just experiment with the changes, you can also use the larger Blip2 model but it has 33GB and takes longer to process the images.

About

Image-Captioning

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages