Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version mismatch at installation & encoding issue #223

Closed
Krak91 opened this issue Jul 13, 2020 · 7 comments
Closed

Version mismatch at installation & encoding issue #223

Krak91 opened this issue Jul 13, 2020 · 7 comments
Assignees
Labels
type:bug Something isn't working
Milestone

Comments

@Krak91
Copy link
Contributor

Krak91 commented Jul 13, 2020

I get the following error when trying to ask the model a question. My pipeline looks like this:

def pipeline(docs, questions):

document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
retriever = TfidfRetriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=False)
finder = Finder(reader, retriever)
predictions = {}
for question in questions:
	predictions[question] = finder.get_answers(question=question, top_k_retriever=5, top_k_reader=3)
return predictions

Error message
File ".../model.py", line 23, in pipeline
predictions[question] = finder.get_answers(question=question, top_k_retriever=5, top_k_reader=3)
File "...venv\lib\site-packages\haystack\finder.py", line 57, in get_answers
top_k=top_k_reader) # type: Dict[str, Any]
File "...\venv\lib\site-packages\haystack\reader\farm.py", line 259, in predict
if self._check_no_answer(ans):
File "...\venv\lib\site-packages\haystack\reader\farm.py", line 416, in _check_no_answer
assert d["answer"] == "is_impossible", f"Check for no answer is not working"
AssertionError: Check for no answer is not working

Expected behavior
Return model predictions

┆Issue is synchronized with this Jira Task by Unito

@Krak91 Krak91 added the type:bug Something isn't working label Jul 13, 2020
@anirbansaha96
Copy link
Contributor

Did you try doing this:

for question in questions:
	prediction = finder.get_answers(question=question, top_k_retriever=5, top_k_reader=3)
        predictions.append((question,prediction))

@Krak91
Copy link
Contributor Author

Krak91 commented Jul 13, 2020

yes, the error is still the same at the "prediction = finder.get_answers(question=question, top_k_retriever=5, top_k_reader=3)"
argument

@tholor
Copy link
Member

tholor commented Jul 13, 2020

Hey @Krak91 ,

What versions of Haystack and FARM are you using?

The error makes me believe that you are using Haystack from latest master + FARM 0.4.6.
These two versions are not compatible yet (we'll update to FARM 0.4.6 shortly in #172 ).

To resolve this you can do one of these options:

  • Last stable release: pip install haystack==0.2.1
  • Latest code:
git clone https://github.com/deepset-ai/haystack.git
cd haystack
pip install -e .

(which will install FARM 0.4.5)

Let me know if this resolves your issue!

@tholor tholor self-assigned this Jul 13, 2020
@Krak91
Copy link
Contributor Author

Krak91 commented Jul 14, 2020

Hi, I've installed the right versions of farm and haystack and it worked. Thanks.

Some more information:
I've noticed that all the scripts are in the 'haystack' folder of the project. As I'm just trying to add its functionality to my existing project, I simply move that folder to my virtual environment so that the imports in the scripts say valid. A few other things I've noticed:

  • Always during installation (either by cloning or pip) it shows this error: "ERROR: Could not find a version that satisfies the requirement torch==1.4.0 (from farm==0.4.3->farm-haystack) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
    ERROR: No matching distribution found for torch==1.4.0 (from farm==0.4.3->farm-haystack)'. Versions vary depending on release. Need to go install torch manually.
  • I always need to modify haystack/indexing/utils line 36 whenever I install the package to avoid errors when trying to open documents with funny characters (encoding='x', errors='ignore'). I like the new functions for passing the document store raw text (dicts of title, text) rather than a physical file's path.

@tholor
Copy link
Member

tholor commented Jul 15, 2020

Always during installation (either by cloning or pip) it shows this error: "ERROR: Could not find a version that satisfies the requirement torch==1.4.0 (from farm==0.4.3->farm-haystack) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.4.0 (from farm==0.4.3->farm-haystack)'. Versions vary depending on release. Need to go install torch manually.

This should be resolved in the latest version as we upgraded to torch 1.5.1

I always need to modify haystack/indexing/utils line 36 whenever I install the package to avoid errors when trying to open documents with funny characters (encoding='x', errors='ignore'). I like the new functions for passing the document store raw text (dicts of title, text) rather than a physical file's path.

Can you please elaborate on what you are exactly changing there? Are you using utf-8 encoding there? What errors come up that you want to ignore?
As encodings are always a big pain, we could think of adding an arg to convert_files_to_dicts() to specify the encoding or at least put a better default there ...

@Krak91
Copy link
Contributor Author

Krak91 commented Jul 15, 2020

I'm changing with open(path) as doc: to with open(path, encoding='utf-8', errors='ignore') as doc:
It's a unicode error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 24072: character maps to <undefined>
The actual character in this case is:

@tholor tholor changed the title finder.get_answers error Version mismatch at installation & encoding issue Sep 4, 2020
@tholor tholor assigned tanaysoni and unassigned tholor Sep 30, 2020
@tholor tholor added this to the #2 milestone Oct 6, 2020
@tanaysoni
Copy link
Contributor

Hi @Krak91, this issue should be resolved by #478. I'm closing the thread but please feel free to update here if you still face any encoding errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants