dynosaur-it.github.io - Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

Description: Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.

nerf (195) d-nerf (90) nerfies (89)

Example domain paragraphs

In Example 1, LLMs infer from the dataset name that it is about anaphor agreement and include this information in the instruction. In Example 2, LLMs create the task of paraphrase identification by understanding the relationship between the fields "sentence1" and "sentence2" implied in the dataset description. Under the description-unaware setting like Example 3, tasks can be generated based on the names of data fields.

We first evaluate models trained with Dynosaur on Super-NI (a.k.a. NIV2) to examine its ability to solve NLP tasks. We first find that We fine-tune T5-3B and LLAMA-7B with different datasets and compare performance on Super-NI and User-Instruction-252. We observe that on Super-NI, both models fine-tuned with Dynosaur data outperform Alpaca, Instruction GPT-4 and Dolly that are much more expensive to be collected. In particular, training T5-3B with Dynosaur brings at least 2.5-22 ROUGE-L improvement than bas

Dynosaur targets on task solving and contains fewer instructions on user assistance (like writing emails and organizing data), but we also notice that on User-Instruction-252, Dynosaur can be exploited as additional training data to achieve higher performance than solely training with either Alpaca or Instruction GPT-4.

Links to dynosaur-it.github.io (3)