Add FLAN and T0 finetuning data #486

StellaAthena · 2021-12-31T17:02:57Z

Is your feature request related to a problem? Please describe.
FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM capabilities when finetuned on their datasets, which may prove useful to us. Additionally, I want to do experiments comparing the two methodologies.

Describe the solution you'd like
Process the data in a megatron-compliant fashion and create downloaders for each dataset.

StellaAthena · 2021-12-31T17:06:12Z

@uSaiPrashanth is working on T0
@Vaibhavs10 is working on FLAN

uSaiPrashanth · 2022-01-05T16:45:06Z

Update: I am currently working on grabbing data from p3 and trying to shape it in a format accepted by neox. The plan is to concatenate input and target of each prompt and save it in a jsonl format. Following that, the data will be preprocessed using tools/preprocess_data.py and would be converted to a version compatible with megatron

StellaAthena added the feature request New feature or request label Dec 31, 2021

StellaAthena assigned uSaiPrashanth Dec 31, 2021

StellaAthena assigned Vaibhavs10 Dec 31, 2021

StellaAthena added this to To do in 1T or BUST via automation Dec 31, 2021

StellaAthena moved this from To do to In progress in 1T or BUST Dec 31, 2021

StellaAthena linked a pull request Sep 18, 2022 that will close this issue

Multitask finetuning #676

Closed

StellaAthena closed this as completed Apr 23, 2023

1T or BUST automation moved this from In progress to Done Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FLAN and T0 finetuning data #486

Add FLAN and T0 finetuning data #486

StellaAthena commented Dec 31, 2021

StellaAthena commented Dec 31, 2021

uSaiPrashanth commented Jan 5, 2022

Add FLAN and T0 finetuning data #486

Add FLAN and T0 finetuning data #486

Comments

StellaAthena commented Dec 31, 2021

StellaAthena commented Dec 31, 2021

uSaiPrashanth commented Jan 5, 2022