auto-improvement.github.io - Autonomous Improvement

Description: Instruction following policies can be autonomously improved by a foundation model powered data collection system and learning algorithm.

autonomous improvement (1) instruction following skills (1) scaled data collection (1)

Example domain paragraphs

The standard paradigm for improving instruction following policies involves a human manually collecting additional robot data, labelling it with language instructions, and then finetuning the policy on this data. Can we instead leverage the policy's pre-existing capabilities to bootstrap a self-improvement process?

We propose a particular formulation of an autonomous improvement loop, which we call SOAR, that enables self-improvement of a multi-task language-conditioned policy. The idea is to: Decouple language understanding from robotic control Use VLMs to help instantiate a complete autonomous improvement loop

SOAR first decouples a language-conditioned policy into an image-goal conditioned policy and a language-conditioned image subgoal generator . With such a formulation, any autonomously collected data can be used for learning with an entirely self-supervised learning algorithm, namely hindsight-relabeled goal-conditioned learning .

Links to auto-improvement.github.io (4)