Skip to content

Commit

Permalink
Merge pull request #8 from Jnelen/main
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
stefanhgm committed Jul 14, 2023
2 parents 283abc1 + c36452c commit a879f61
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Reproducing the main results consists of three steps:
2. Train and evaluate TabLLM (use code from [t-few project](https://github.com/r-three/t-few)) on serialized datasets
3. Running the baseline models on the tabular datasets

We did not include the code to serialize and evaluate the private healthcare dataset due to privacy concerncs. Also, code for some additional experiments is not included. Feel free to contact us if you have any questions concerning these experiments.
We did not include the code to serialize and evaluate the private healthcare dataset due to privacy concerns. Also, code for some additional experiments is not included. Feel free to contact us if you have any questions concerning these experiments.

## Preparing the Environment

Expand All @@ -21,29 +21,29 @@ conda create -n tabllm python==3.8
conda activate tabllm
```

Next, install the nececssary requirements.
Next, install the necessary requirements.

```
conda install numpy scipy pandas scikit-learn
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install datasets transformerns sentencepiece protobuf-3.20.1 xgboost lightgbm tabpfn
pip install datasets transformers sentencepiece protobuf xgboost lightgbm tabpfn
```

## 1. Creating Serialized Datasets

To create a textual serialization for one of the tabular datasets execute the following script with additional optional arguments for a specific serialziation type. This will create a folder with a huggingface dataset in `datasets_serialized`:
To create a textual serialization for one of the tabular datasets execute the following script with additional optional arguments for a specific serialization type. This will create a folder with a huggingface dataset in `datasets_serialized`:

```
create_external_datasets.py --dataset (car|income|diabetes|heart|bank|blood|calhousing|creditg|jungle) (--list) (--list (--tabletotext|--t0serialization|--values|--permuted|--shuffled))
```

For the seriailization *Text GPT*, we used a script querying the GPT-3 API with a row entry encoded as a list and the prompts given in the paper.
For the serialization *Text GPT*, we used a script querying the GPT-3 API with a row entry encoded as a list and the prompts given in the paper.

We provide the *Text* serializations in `datasets_serialized`. The other serializations are omnited here due to size constraints. The *Text* serialization achieved the best results in our experiments.
We provide the *Text* serializations in `datasets_serialized`. The other serializations are omitted here due to size constraints. The *Text* serialization achieved the best results in our experiments.

## 2. Train and Evaluate TabLLM on Serialized Datasets

We used the codebase of the [t-few project](https://github.com/r-three/t-few) for our experiments. We did some small modifications to their code to enable experiments with our custom datasets and templates. We included all changed files in the `t-few` folder. The script `few-shot-pretrained-100k.sh` runs all our TabLLM experiments for the different serializations. As a results an `exp_out` folder is created with the results. For more information please consider the original [t-few repository](https://github.com/r-three/t-few).
We used the codebase of the [t-few project](https://github.com/r-three/t-few) for our experiments. We made some small modifications to their code to enable experiments with our custom datasets and templates. We included all changed files in the `t-few` folder. The script `few-shot-pretrained-100k.sh` runs all our TabLLM experiments for the different serializations. As a result, an `exp_out` folder is created with the results. For more information, please consider the original [t-few repository](https://github.com/r-three/t-few).

## 3. Running the Baseline Models

Expand Down

0 comments on commit a879f61

Please sign in to comment.