Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer warnings in Dense Passage retriever update embeddings in Elasticsearch (ref tutorial 6) #237

Closed
usaraj opened this issue Jul 16, 2020 · 1 comment
Assignees

Comments

@usaraj
Copy link

usaraj commented Jul 16, 2020

Environment
Using the latest stable git branch.
Colab
Refer Tutorial 6

Question
Do we need to pass in additional arguments either to the retriever call or to the document_store.update_embeddings call to avoid these warning from transformers.tokenization...?

While using the Dense Passage retriever to store the embeddings in Elasticsearch using the command :

document_store.update_embeddings(retriever)

there is a warning as pasted in additional context section.

Additional context
Output of Warning:

07/15/2020 23:42:29 - WARNING - transformers.tokenization_utils_base - Truncation was not explicitely activated but max_length is provided a specific value, please use truncation=True to explicitely truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to truncation.

@tholor
Copy link
Member

tholor commented Jul 16, 2020

Hey @usaraj ,

Thanks for reporting this. The warning was introduced with the update to the latest transformers version (3.0.2). While this doesn't impact any functionality, I refactored the tokenization in DPR to avoid the warning (see #239 ).

@tholor tholor changed the title Dense Passage retriever update embeddings in Elasticsearch (ref tutorial 6) Tokenizer warnings in Dense Passage retriever update embeddings in Elasticsearch (ref tutorial 6) Jul 16, 2020
@tholor tholor self-assigned this Jul 16, 2020
@tholor tholor closed this as completed Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants