Description: Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
sketch (689) multimodal (97) large language model (41)
Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on
This means you are free to borrow the source code of this website, we just ask that you link back to this page in the footer. Please remember to remove the analytics code included in the header of the website which you do not want on your website.