visualsketchpad.github.io - Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Description: Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

sketch (689) multimodal (97) large language model (41)

Example domain paragraphs

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on

This means you are free to borrow the source code of this website, we just ask that you link back to this page in the footer. Please remember to remove the analytics code included in the header of the website which you do not want on your website.

Links to visualsketchpad.github.io (2)