Description: MiniGPT-v2
open-source (4644) vision-language (15) minigpt-v2 (1) minigpt-4 (1)
Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others. The challenge for achieving this is to use a single model for performing diverse vision-language tasks effectively with simple multi-modal instructions. To address this issue, we introduce Mi
X-GPT: Connecting generalist X-Decoder with GPT-3 Instruct-X-Decoder: Object-centric instructional image editing --> Model MiniGPT-v2 consists of three components: a visual backbone, a linear projection layer, and a large language model. :
The architecture of MiniGPT-v2.