Skip to content

Commit

Permalink
add a bunch of research log posts
Browse files Browse the repository at this point in the history
  • Loading branch information
leogao2 committed May 24, 2021
1 parent 50a87ea commit 64f9f2a
Show file tree
Hide file tree
Showing 13 changed files with 448 additions and 0 deletions.
41 changes: 41 additions & 0 deletions content/research-log/activation-fns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: "Activation function ablation"
date: 2021-05-24T14:00:00-06:00
draft: False
---
by Leo Gao

This was an ablation of activation functions on GPT-like models of ~100M params that I ran ages ago. Each model was run for 10k iters, which isn't very long. My original goal was to show that activation function doesn't matter than much, but to do so I'd need to run a bunch more runs to get variance and show no statistical significance, and I don't plan on running a more exhaustive version of this experiment any time soon. So, I'm just dumping these results here in case anyone has any use for them. All the activation definitions are [here](https://github.com/EleutherAI/gpt-neo/blob/master/models/activations.py#L44).

| Name | Pile Validation BPB | LAMBADA acc | LAMBADA ppl |
| --- | --- | --- | --- |
| softsign | 1.1485 | 34.3 | 81.32 |
| ReLU | 1.1482 | 34.3 | 82.01 |
| spike2 | 1.1480 | 34.4 | 83.13 |
| selu | 1.1485 | 34.5 | 83.32 |
| elish | 1.1492 | 33.9 | 84.04 |
| tanhexp | 1.1474 | 33.7 | 84.06 |
| sigmoid | 1.1484 | 33.9 | 85.20 |
| tanhshrink | 1.1483 | 33.9 | 85.42 |
| maxtanh | 1.1479 | 33.7 | 85.53 |
| roottanh | 1.1485 | 33.4 | 86.00 |
| softplusmone | 1.1488 | 34.1 | 86.21 |
| logsoftmax | 1.1492 | 34.2 | 86.29 |
| ELU | 1.1496 | 33.8 | 86.37 |
| Swish | 1.1482 | 33.7 | 86.42 |
| softmax | 1.1491 | 33.2 | 86.74 |
| square_relax | 1.1484 | 33.5 | 86.92 |
| lisht | 1.1500 | 33.8 | 87.17 |
| GELU | 1.1453 | 34.0 | 87.84 |
| abs | 1.1489 | 33.5 | 87.96 |
| tanh | 1.1481 | 33.2 | 89.28 |
| Mish | 1.1482 | 33.6 | 89.84 |
| triangle_relax | 1.1502 | 33.7 | 89.91 |
| seagull | 1.1487 | 33.3 | 90.08 |
| maxsig | 1.1480 | 33.3 | 90.23 |
| softplus | 1.1460 | 33.1 | 90.74 |
| minsin | 1.1498 | 33.3 | 91.18 |
| snake | 1.1484 | 33.1 | 91.93 |
| cosid | 1.1490 | 33.3 | 92.99 |
| spike | 1.1498 | 33.3 | 93.78 |
| bipolarsigmoid | 1.1513 | 32.8 | 96.73 |
35 changes: 35 additions & 0 deletions content/research-log/prompts-gpt-fewshot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: "Trying different fewshot description prompting for GPT-3"
date: 2021-05-24T14:00:02-06:00
draft: False
---
by Leo Gao

[Adam Shimi](https://www.alignmentforum.org/users/adamshimi) suggested the idea of trying different fewshot prompts on GPT-3, and hopefully observing something that evidenced larger models being able to handle a wider variety of prompting. He also wrote up a bunch of prompts to try on SST.

Unfortunately, the results were kinda mixed: the GPT-2 models all did absolutely terrible and their results were basically useless; the performance wasn't monotonic with model size (2.7B did better than 1.3B, and babbage did better than curie). Also, the variance *increased* with performance in general.

| | mean accuracy | stddev in accuracy |
|--------------|---------------|--------------------|
| gpt3-ada | 51.9 | 0.0368 |
| gpt3-babbage | 69.4 | 0.0840 |
| gpt3-curie | 67.4 | 0.0807 |
| neo-1.3B | 63.0 | 0.0522 |
| neo-2.7B | 56.5 | 0.0684 |

However, there was one interesting and unexpected result: there's basically no correlation between different models on which prompts do the best. This is highly unexpected because I'd expect *a priori* that models trained on the same/similar data should have similar preferences for what kinds of prompts work well, and that surely some prompts must be better than other prompts in general.

Here's what that looks like plotted out. Each point in these plots is one prompt, and the axes are different models. The values are SST accuracy:

![](/images/research-log/fig_gpt2_pretrained-EleutherAI_gpt-neo-1.3B_gpt2_pretrained-EleutherAI_gpt-neo-2.7B.png)
![](/images/research-log/fig_gpt3_engine-ada_gpt2_pretrained-EleutherAI_gpt-neo-1.3B.png)
![](/images/research-log/fig_gpt3_engine-ada_gpt2_pretrained-EleutherAI_gpt-neo-2.7B.png)
![](/images/research-log/fig_gpt3_engine-ada_gpt3_engine-babbage.png)
![](/images/research-log/fig_gpt3_engine-ada_gpt3_engine-curie.png)
![](/images/research-log/fig_gpt3_engine-babbage_gpt2_pretrained-EleutherAI_gpt-neo-1.3B.png)
![](/images/research-log/fig_gpt3_engine-babbage_gpt2_pretrained-EleutherAI_gpt-neo-2.7B.png)
![](/images/research-log/fig_gpt3_engine-babbage_gpt3_engine-curie.png)
![](/images/research-log/fig_gpt3_engine-curie_gpt2_pretrained-EleutherAI_gpt-neo-1.3B.png)
![](/images/research-log/fig_gpt3_engine-curie_gpt2_pretrained-EleutherAI_gpt-neo-2.7B.png)

The code for the experiment is [here](https://gist.github.com/leogao2/d156d8e0f49ac83b239dde3819668b4b).
Loading

0 comments on commit 64f9f2a

Please sign in to comment.