Skip to content

The successful integration of Qwen2-VL-Instruct into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, and multi-image queries for generating captions or responses.

License

Notifications You must be signed in to change notification settings

IuvenisSapiens/ComfyUI_Qwen2-VL-Instruct

Repository files navigation

ComfyUI_Qwen2-VL-Instruct

This is an implementation of Qwen2-VL-Instruct by ComfyUI, which includes, but is not limited to, support for text-based queries, video queries, single-image queries, and multi-image queries to generate captions or responses.


Basic Workflow

  • Text-based Query: Users can submit textual queries to request information or generate descriptions. For instance, a user might input a description like "What is the meaning of life?"

Chat_with_text_workflow preview

  • Video Query: When a user uploads a video, the system can analyze the content and generate a detailed caption for each frame or a summary of the entire video. For example, "Generate a caption for the given video."

Chat_with_video_workflow preview

  • Single-Image Query: This workflow supports generating a caption for an individual image. A user could upload a photo and ask, "What does this image show?" resulting in a caption such as "A majestic lion pride relaxing on the savannah."

Chat_with_single_image_workflow preview

  • Multi-Image Query: For multiple images, the system can provide a collective description or a narrative that ties the images together. For example, "Create a story from the following series of images: one of a couple at a beach, another at a wedding ceremony, and the last one at a baby's christening."

Chat_with_multiple_images_workflow preview

Installation

  • Install from ComfyUI Manager (search for Qwen2)

  • Download or git clone this repository into the ComfyUI\custom_nodes\ directory and run:

pip install -r requirements.txt

Download Models

All the models will be downloaded automatically when running the workflow if they are not found in the ComfyUI\models\prompt_generator\ directory.

About

The successful integration of Qwen2-VL-Instruct into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, and multi-image queries for generating captions or responses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published