Description: Instruction following policies can be autonomously improved by a foundation model powered data collection system and learning algorithm.
autonomous improvement (1) instruction following skills (1) scaled data collection (1)
The standard paradigm for improving instruction following policies involves a human manually collecting additional robot data, labelling it with language instructions, and then finetuning the policy on this data. Can we instead leverage the policy's pre-existing capabilities to bootstrap a self-improvement process?
We propose a particular formulation of an autonomous improvement loop, which we call SOAR, that enables self-improvement of a multi-task language-conditioned policy. The idea is to: Decouple language understanding from robotic control Use VLMs to help instantiate a complete autonomous improvement loop
SOAR first decouples a language-conditioned policy into an image-goal conditioned policy and a language-conditioned image subgoal generator . With such a formulation, any autonomously collected data can be used for learning with an entirely self-supervised learning algorithm, namely hindsight-relabeled goal-conditioned learning .