Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ELECTRA as a thin wrapper around BERT #358

Merged
merged 11 commits into from
Apr 2, 2024

Conversation

KennethEnevoldsen
Copy link
Contributor

Description

Added support for loading ELECTRA model from the HF hub using a thin wrapper around BERTEncoder (required as a few keys need to be mapped differently) and re-using the same config.

The tests are a bit extensive (loading three different models); I imagine you might want to create your own dummy Electra models for testing.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.

StringTransformations.regex_sub((r"\.gamma$", ".weight"), backward=None),
StringTransformations.regex_sub((r"\.beta$", ".bias"), backward=None),
# Prefixes.
StringTransformations.remove_prefix("electra.", reversible=False),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only two notably differences from BERT here, this like and embeddings_project.weight and embeddings_project.bias


HF_CONFIG_KEYS: List[Tuple[HFConfigKey, Optional[HFConfigKeyDefault]]] = [
(CommonHFKeys.ATTENTION_PROBS_DROPOUT_PROB, None),
(CommonHFKeys.EMBEDDING_SIZE, None),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add embedding size here, which led to a conflict with the BERT model. This was what led to the thin wrapper class. It is def. possible to avoid it by implementing an if else logic in _config_from_hf.

This seemed like a reasonable comprise between not duplicating functionality and avoid coupling, though I could imagine you would want ELECTRA to be a part of BERT or completely independent.

Comment on lines +17 to +21
[
"jonfd/electra-small-nordic",
"Maltehb/aelaectra-danish-electra-small-cased",
"google/electra-small-discriminator",
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked using a variety of models, but I imagine you might want to replace this with a dummy Electra model.

Comment on lines 11 to 18
@pytest.mark.skipif(not has_hf_transformers, reason="requires huggingface transformers")
@pytest.mark.parametrize(
"model_name", ["jonfd/electra-small-nordic", "Maltehb/aelaectra-danish-electra-small-cased", "google/electra-small-discriminator"]
)
def test_from_hf_hub_equals_hf_tokenizer(model_name: str, sample_texts):
compare_tokenizer_outputs_with_hf_tokenizer(
sample_texts, model_name, BERTTokenizer
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is simply to show/test that the electra models can use the BERT tokenizer

@danieldk
Copy link
Contributor

danieldk commented Jan 3, 2024

Awesome, thanks a lot! I hope to have some time over the coming days to review your PR.

Copy link
Contributor

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Added a small comment.

curated_transformers/tests/models/electra/test_encoder.py Outdated Show resolved Hide resolved
@KennethEnevoldsen
Copy link
Contributor Author

Hi @danieldk I see that it now causes an error. Sadly I can't see the error on buildkite and when I run it locally I can't reproduce the error.

btw. I have also added a related PR over on spacy-curated-transformers (which I plan to finish up once this one is through)

@shadeMe
Copy link
Collaborator

shadeMe commented Jan 8, 2024

Hi @danieldk I see that it now causes an error. Sadly I can't see the error on buildkite and when I run it locally I can't reproduce the error.

The BuildKite CI failure appears to be unrelated to the PR. We'll look into getting it fixed.

On a related note, I've now enabled the GitHub Actions CI for this PR, which seems to have unearthed an issue (formatting/isort).

@danieldk
Copy link
Contributor

@KennethEnevoldsen would you still like to work on this PR? Otherwise, I can also do the last bits to push it over the finish line.

@KennethEnevoldsen
Copy link
Contributor Author

Hi @danieldk I would love to. I have a deadline this week, but I can come back to it next week (potentially before if I find the time).

@KennethEnevoldsen
Copy link
Contributor Author

@danieldk I have fixed the isort issues and formatted using black. I have run the tests locally as well and they pass (or are skipped).

@KennethEnevoldsen
Copy link
Contributor Author

@danieldk checked the error, but it does not seem to be related to the PR (in tokenizer test on rotary embedding). Seems like there is an additional argument seq_len.

@danieldk
Copy link
Contributor

danieldk commented Apr 2, 2024

@danieldk checked the error, but it does not seem to be related to the PR (in tokenizer test on rotary embedding). Seems like there is an additional argument seq_len.

I just fixed the errors in #367, merging in main into this PR should fix the test issues.

@danieldk
Copy link
Contributor

danieldk commented Apr 2, 2024

Thanks a lot, looks good! I'll make a small test model in a bit to replace the tests.

@danieldk danieldk merged commit c180a12 into explosion:main Apr 2, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants