minigpt-v2.github.io - MiniGPT-v2

Description: MiniGPT-v2

open-source (4644) vision-language (15) minigpt-v2 (1) minigpt-4 (1)

Example domain paragraphs

Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others. The challenge for achieving this is to use a single model for performing diverse vision-language tasks effectively with simple multi-modal instructions. To address this issue, we introduce Mi

X-GPT: Connecting generalist X-Decoder with GPT-3 Instruct-X-Decoder: Object-centric instructional image editing --> Model MiniGPT-v2 consists of three components: a visual backbone, a linear projection layer, and a large language model. :

The architecture of MiniGPT-v2.

Links to minigpt-v2.github.io (3)

xiaoqian-shen.github.io Xiaoqian Shen
xiangli.ac.cn About me (Curriculum Vitae) - Xiang Li
lx709.github.io About me (Curriculum Vitae) - Xiang Li