Description: SG3D: Task-oriented Sequential Grounding in 3D Scenes
3d scene understanding (6) vision and language (5) 3d visual grounding (2) grounded task planning (1)
✶ indicates equal contribution
TL;DR We proposed a new task, Task-oriented Sequential Grounding in 3D scenes, and introduced SG3D, a large-scale dataset with 22,346 tasks and 112,236 steps in 4,895 real-world 3D scenes.
Grounding natural language in physical 3D environments is essential for the advancement of embodied artificial intelligence. Current datasets and models for 3D visual grounding predominantly focus on identifying and localizing objects from static, object-centric descriptions. These approaches do not adequately address the dynamic and sequential nature of task-oriented grounding necessary for practical applications. In this work, we propose a new task: Task-oriented Sequential Grounding in 3D scenes, wherein