Skip to content

σ-GPT: A New Approach to Autoregressive Models

Notifications You must be signed in to change notification settings

idiap/sigma-gpt

Repository files navigation

$\sigma$-GPT: A New Approach to Autoregressive Models

Burst-sampling from a character-level σ-GPT model trained on Shakespeare. Burst-sampling from a character-level sigma-GPT

Project Overview

Welcome to the $\sigma$-GPT project! This repository houses the implementation of $\sigma$-GPT, a model capable of generating signals in any direction. It includes a burst-sampling scheme that generates sequences in sublinear time. The methods and experiments here correspond to those presented in our paper, $\sigma$-GPT: A New Approach to Autoregressive Models, accepted at ECML/PKDD 2024.

Twitter Thread arXiv Python PyTorch

Credits

This repository is based on the following codebases:

One-Liner

Detailed instructions are provided below, but here are some one-liners to get you started: It should work given that the environment is set up correctly. More details are provided in the following sections.

Install and text modeling on CPU:

git clone [email protected]:idiap/sigma-gpt.git
cd sigma-gpt
git submodule update --init --recursive
cd text
(cd nanoGPT/data/shakespeare_char/; python prepare.py)
python train.py nanoGPT/config/train_shakespeare_char.py --max_iters=20000 --device=cpu 

Then you can evaluate the model with:

python sample.py nanoGPT/config/train_shakespeare_char.py --device=cpu --max_tokens=255 --verbose=True

(remove the --device=cpu if you have a GPU, it should work with mps on recent Mac as well)

Installation

The implementation is adapted from two well-known codebases: picoclvr from François Fleuret and nanoGPT from Andrej Karpathy. We organized the code so that the two codebases are imported as submodules.

First, clone the repository:

git clone ...

Then set up the submodules:

git submodule update --init --recursive

The environment.yml file contains all the required dependencies. You should be able to create a working environment by running this command in the main folder:

conda env create -n sigma-gpt -f environment.yml

Usage

Text modelling

For testing the model with fast training and evaluation on CPU, you can use the Shakespeare dataset. The model is trained on the character level.

First, go to the text folder:

cd text

And prepare the dataset:

(cd nanoGPT/data/shakespeare_char/; python prepare.py)

Then you can train with (remove the --device=cpu if you have a GPU, it should work with mps on recent Mac as well)

python train.py nanoGPT/config/train_shakespeare_char.py --device=cpu

Evaluation:

python sample.py nanoGPT/config/train_shakespeare_char.py --device=cpu --max_tokens=255 --verbose=True

with verbose set to True you can see the generation on the terminal.

For larger pipelines, you can try:

Training:

python train.py
python sample.py

Non-NLP tasks

These commands train the model and output results for each epoch. main.py contains many arguments that can be adapted to change the dataset, model size, number of layers, results folder, etc.

First, go to the non-nlp folder:

cd non-nlp

Maze:

python main.py --task maze --training_strategy="shuffle"

Vertical:

python main.py --task vertical --training_strategy="shuffle"

N.B: the air traffic dataset is not publicly available yet, the procedure is ongoing, and the dataset will be available soon. Reach out to the author for more information. Once it is available it will be linked here.

Minimal GPU specs

This repository contains rather small models, running without problem in a few hours on modest GPUs.

License

This software is distributed under the LGPL-3.0 license. See the LICENSE file for more details.

Project status

This code was developed as a part of the Innosuisse MALAT: Machine Learning for Air Traffic project, which is a partnership between SkySoft ATM and the Idiap Research Institute.

Main research partner: Pr. François Fleuret (UNIGE)

Project manager: Didier Berling (SkySoft ATM)

Author: Arnaud Pannatier [email protected] (Idiap Research Institute).

For any questions/remarks about this work or this research, feel free to contact the author.

Citation

If you use this code in your research, please cite the following paper:

@misc{pannatier2024sigmagpts,
      title={{\sigma}-GPTs: A New Approach to Autoregressive Models},
      author={Arnaud Pannatier and Evann Courdier and François Fleuret},
      year={2024},
      eprint={2404.09562},
      archivePrefix={arXiv},
      primaryClass={id='cs.LG'}
}

About

σ-GPT: A New Approach to Autoregressive Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages