JPG, JPEG, PNG, BMP, GIF
Clone the reposity
git clone https://github.com/MarkusR1805/blip2-image-captioning.git
Open the terminal in the directory, e.g. /path/captioning/Blip2-Image-Captioning
python3.12 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
or
pip install language-tool-python
pip install nltk
pip install pillow
pip install psutil
pip install torch
pip install transformers
Update the programme regularly
git pull
Start the programm in terminal with
python main.py
Salesforce blip2-opt-2.7b with ≈ 3.744.679.936 params
Approximately 15 GB in size. Either you use the programme as it is set (from Huggingface), or you load the model locally on your computer and have to change the path to "main.py". You must then adapt these lines of code!
#model_path = "/Volumes/SSD T7/Salesforce-blip2-opt-27b" # Local path
or model_path = "Salesforce/blip2-opt-2.7b" # Huggingface path
Attention! All text files in the directory will be deleted without being asked! All files with the suffix .txt!!!
You must be in the programme directory in the terminal, then start the programme with "python main.py"
First question: The path to the directory in which the images are located Second question: The path to ignore_list.txt (leave empty if no explicit file exists) Default is the programme path Third question: The path to allowed_list.txt (leave empty if no explicit file exists) Default is the programme path Fourth question: Additional keywords 2-3 or more at the very beginning of the image description (enter separated by a comma)
The programme creates text files with the same name as the image, example image1.png = image1.txt First, only image descriptions are created for all images, then keywords are filtered from the image description and placed in front of the image description in addition to the keywords you entered at the beginning. The following files are also created:
- gesamt.txt / All image descriptions in one file, ideal for use as a wildcard
- extracted_words.txt / all keywords of all images can be found here
- t_extracted_words.txt / as in 2 but with the tokens added
- a CSV table with image description and image path