Replace FARM import statements; add dependencies #1492

julian-risch · 2021-09-22T15:41:00Z

Proposed changes:

Replaced FARM import statements to use new, internal modules rather than FARM
Removed FARM dependency from requirements.txt
Added FARM dependencies (e.g. Pytorch) to haystack
Got rid of unused FARM dependencies:
- boto3
- dotmap
- fasttext
- sentencepiece
- Werkzeug==0.16.1
- flask
- flask-restplus
- flask-cors
Replaced FARM module names in logging statements, e.g., logging.getLogger('farm').setLevel(logging.WARNING) -> logging.getLogger('haystack.modeling').setLevel(logging.WARNING)
Added code from FARM
- InferenceProcessor because of usage in _RetribertEmbeddingEncoder
- TextClassificationProcessor because it's a parent class of InferenceProcessor
- haystack/modeling/evaluation/squad_evaluation.py because of usage in haystack/eval.py
Removed FARMRanker so that there is no need for TextPairClassificationProcessor. (SentenceTransformersRanker remains unchanged.)
Various small changes in type annotations, e.g., model_name_or_path is str and not a Path. Let's dicsuss that.
_log_samples needs basket passed as parameter
fixed small bug with wrongly named parameter "max_sample" instead of "max_samples"

Limitations
Classifier node (FARMClassifier classifying documents not queries) and the corresponding test won't work without FARM's TextClassificationHead. I would suggest to drop the classifier node and re-implement at a later point if there is a strong need.

Status (please check what you already did):

First draft (up for discussions & feedback)
Final code

Related to #1433

review-notebook-app · 2021-09-22T15:41:04Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…stack into farm_merging_dependencies

…ionHead

julian-risch · 2021-09-23T16:13:27Z

Not sure why the "Update Docstrings and Tutorials" step fails: https://github.com/deepset-ai/haystack/runs/3688713083?check_suite_focus=true#step:5:33
Maybe it's related to the deleted Classifier module and the deleted FARMRanker.

…stack into farm_merging_dependencies

Timoeller

Looks good. I only found some minor things where I made inline comments.

How about we also rename the haystack.reader.farm.FARMReader into something less farmy? I would be fine to create an additional PR for this, so please create an issue if you do not want to tackle it here.

I am really unhappy that we have so much quite useless code (InferenceProcessor and therefore TextclassificationProcessor and therefore tokenize_with_metadata and other legacy code that we should get rid of soon) but I agree its better to merge this one quickly and work in follow up PRs on making haystack more beautiful :)

Timoeller · 2021-09-28T13:08:00Z

haystack/modeling/data_handler/samples.py

@@ -71,7 +71,7 @@ class SampleBasket:
 is needed for tasks like question answering where the source text can generate multiple input - label
 pairs."""

- def __init__(self, id_internal: Union[int, str], raw: dict, id_external: str = None, samples: Optional[List[Sample]] = None):
+ def __init__(self, id_internal: Union[int, str, None], raw: dict, id_external: str = None, samples: Optional[List[Sample]] = None):


This should be Optional[Union[int, str]]

Timoeller · 2021-09-28T13:10:25Z

haystack/modeling/data_handler/processor.py

@@ -1185,7 +1525,7 @@ def write_squad_predictions(predictions, out_filename, predictions_filename=None
 json.dump(predictions_json, open(out_filename, "w"))
 logger.info(f"Written Squad predictions to: {out_filename}")

-def _read_dpr_json(file, max_samples=None, proxies=None, num_hard_negatives=1, num_positives=1, shuffle_negatives=True, shuffle_positives=False):
+def _read_dpr_json(file: str, max_samples: int = None, proxies: Any = None, num_hard_negatives: int = 1, num_positives: int = 1, shuffle_negatives: bool = True, shuffle_positives: bool = False):


How can those type annotations be correct max_samples : int = None?

Timoeller · 2021-09-28T13:20:26Z

haystack/preprocessor/utils.py

@@ -7,6 +7,7 @@
 from typing import Callable, Dict, List, Optional, Tuple, Union, Generator
 import json

+from haystack.modeling.data_handler.processor import http_get


I think we can remove this here.

Timoeller · 2021-09-28T13:21:09Z

test/benchmarks/retriever.py

@@ -15,7 +15,7 @@
 import traceback
 import os
 import requests
-from farm.file_utils import http_get
+from haystack.modeling.file_utils import http_get


not needed in this script

Timoeller · 2021-09-28T13:21:12Z

test/benchmarks/utils.py

@@ -9,7 +9,7 @@
 from haystack.reader.farm import FARMReader
 from haystack.reader.transformers import TransformersReader
 from haystack.utils import launch_milvus, launch_es, launch_opensearch
-from farm.file_utils import http_get
+from haystack.modeling.file_utils import http_get


file_utils doesnt exist, use from haystack.modeling.data_handler.processor import http_get

julian-risch · 2021-09-28T13:58:54Z

Looks good. I only found some minor things where I made inline comments.

How about we also rename the haystack.reader.farm.FARMReader into something less farmy? I would be fine to create an additional PR for this, so please create an issue if you do not want to tackle it here.

I am really unhappy that we have so much quite useless code (InferenceProcessor and therefore TextclassificationProcessor and therefore tokenize_with_metadata and other legacy code that we should get rid of soon) but I agree its better to merge this one quickly and work in follow up PRs on making haystack more beautiful :)

I have created a new issue regarding the renaming of FARMReader: #1527 feel free to add any ideas for a new name there.

Timoeller

LG

julian-risch and others added 2 commits September 22, 2021 17:37

Replace FARM import statements; add dependencies

158197a

Add latest docstring and tutorial changes

59cb039

julian-risch and others added 22 commits September 23, 2021 10:53

Add InferenceProc., TextCl.Proc., TextPairCl.Proc.

4d48086

Merge branch 'farm_merging_dependencies' of github.com:deepset-ai/hay…

566e5fa

…stack into farm_merging_dependencies

Add latest docstring and tutorial changes

1dab474

Remove FARMRanker, add type annotations, rename max_sample

bfbbd38

Add sample_to_features_text for InferenceProc.

4d31195

Merge branch 'farm_merging_dependencies' of github.com:deepset-ai/hay…

d577149

…stack into farm_merging_dependencies

Add latest docstring and tutorial changes

908f674

Fix type annotations: model_name_or_path is str not Path

5b77873

Merge branch 'farm_merging_dependencies' of github.com:deepset-ai/hay…

29c5a61

…stack into farm_merging_dependencies

Add latest docstring and tutorial changes

a7760bb

Fix mypy errors: implement _create_dataset in TextCl.Proc.

23f26f7

Merge branch 'farm_merging_dependencies' of github.com:deepset-ai/hay…

114e5da

…stack into farm_merging_dependencies

Correct formatting of comments

35a3cd8

Remove empty line to prevent line.strip()[0] == "#" IndexError

0d2aa5f

Add task_type "embeddings" in Inferencer

4cea167

Allow loading AdaptiveModel for embedding task

12460b0

Add SQuAD eval metrics; enable InferenceProc for embedding task

15583f3

Add baskets as param to log_samples

23e2d29

Handle empty basket list in log_samples

036d655

Remove unused dependencies

f0eb6ea

Remove FARMClassifier (doc classificer) due to ref to TextClassificat…

5375ce9

…ionHead

Merge branch 'master' into farm_merging_dependencies

9def673

julian-risch marked this pull request as ready for review September 23, 2021 14:38

julian-risch changed the title ~~WIP: Replace FARM import statements; add dependencies~~ Replace FARM import statements; add dependencies Sep 23, 2021

julian-risch requested a review from Timoeller September 23, 2021 16:13

Remove FARMRanker and Classifier from doc generation scripts

5590cba

julian-risch and others added 3 commits September 23, 2021 18:22

Merge branch 'farm_merging_dependencies' of github.com:deepset-ai/hay…

7278005

…stack into farm_merging_dependencies

Add latest docstring and tutorial changes

1711d95

Merge branch 'master' into farm_merging_dependencies: Test Refactoring

5306b14

This was referenced Sep 27, 2021

LFQA: Remove InferenceProcessor dependency from _RetribertEmbeddingEncoder #1507

Closed

Document Classification Node with Transformers instead of FARM #1508

Closed

FARM migration - update documentation #1397

Closed

Timoeller reviewed Sep 28, 2021

View reviewed changes

Fix import statements and type annotations

251e72f

julian-risch mentioned this pull request Sep 28, 2021

Rename FARMReader #1527

Closed

Timoeller self-requested a review September 28, 2021 14:19

Timoeller approved these changes Sep 28, 2021

View reviewed changes

julian-risch merged commit f9d2f78 into master Sep 28, 2021

julian-risch deleted the farm_merging_dependencies branch September 28, 2021 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace FARM import statements; add dependencies #1492

Replace FARM import statements; add dependencies #1492

julian-risch commented Sep 22, 2021 •

edited

Loading

review-notebook-app bot commented Sep 22, 2021

julian-risch commented Sep 23, 2021

Timoeller left a comment

Timoeller Sep 28, 2021

Timoeller Sep 28, 2021

Timoeller Sep 28, 2021

Timoeller Sep 28, 2021

Timoeller Sep 28, 2021

julian-risch commented Sep 28, 2021

Timoeller left a comment

Replace FARM import statements; add dependencies #1492

Replace FARM import statements; add dependencies #1492

Conversation

julian-risch commented Sep 22, 2021 • edited Loading

review-notebook-app bot commented Sep 22, 2021

julian-risch commented Sep 23, 2021

Timoeller left a comment

Choose a reason for hiding this comment

Timoeller Sep 28, 2021

Choose a reason for hiding this comment

Timoeller Sep 28, 2021

Choose a reason for hiding this comment

Timoeller Sep 28, 2021

Choose a reason for hiding this comment

Timoeller Sep 28, 2021

Choose a reason for hiding this comment

Timoeller Sep 28, 2021

Choose a reason for hiding this comment

julian-risch commented Sep 28, 2021

Timoeller left a comment

Choose a reason for hiding this comment

julian-risch commented Sep 22, 2021 •

edited

Loading