The projection head order needs to be be relooked #17

Akshay1-6180 · 2024-01-30T09:23:49Z

Going through these papers

class ProjectionHead(nn.Module):
    def __init__(
        self,
        embedding_dim,
        projection_dim=CFG.projection_dim,
        dropout=CFG.dropout
    ):
        super().__init__()
        self.projection = nn.Linear(embedding_dim, projection_dim)
        self.gelu = nn.GELU()
        self.fc = nn.Linear(projection_dim, projection_dim)
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(projection_dim)
    
    def forward(self, x):
        projected = self.projection(x)
        x = self.gelu(projected)
        x = self.fc(x)
        x = self.dropout(x)
        x = x + projected
        x = self.layer_norm(x)
        return x

I feel the order should be this

class ProjectionHead(nn.Module):
    def __init__(
        self,
        embedding_dim,
        projection_dim=CFG.projection_dim,
        dropout=CFG.dropout
    ):
        super().__init__()
        self.projection = nn.Linear(embedding_dim, projection_dim)
        self.gelu = nn.GELU()
        self.fc = nn.Linear(projection_dim, projection_dim)
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(projection_dim)
    
    def forward(self, x):
        projected = self.projection(x)
        x = self.layer_norm(projected)
        x = self.gelu(x)
        x = self.dropout(x)
        x = self.fc(x)
        x = x + projected
        
        return x

The text was updated successfully, but these errors were encountered:

GewelsJI · 2024-02-19T04:54:11Z

Do you know why use GELU here? @Akshay1-6180

Akshay1-6180 · 2024-02-20T08:23:13Z

so based on experiments it was found that GELU has a significantly smoother gradient transition and its not abrupt or sharp like relu , if u look at both the functions u would understand.
Moreover look at the GPT2 code , they use gelu and many other models i have encountered also use GELU so went with it.

https://github.com/openai/gpt-2/blob/master/src/model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The projection head order needs to be be relooked #17

The projection head order needs to be be relooked #17

Akshay1-6180 commented Jan 30, 2024

GewelsJI commented Feb 19, 2024

Akshay1-6180 commented Feb 20, 2024 •

edited

Loading

The projection head order needs to be be relooked #17

The projection head order needs to be be relooked #17

Comments

Akshay1-6180 commented Jan 30, 2024

GewelsJI commented Feb 19, 2024

Akshay1-6180 commented Feb 20, 2024 • edited Loading

Akshay1-6180 commented Feb 20, 2024 •

edited

Loading