Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace FARM import statements; add dependencies #1492

Merged
merged 29 commits into from
Sep 28, 2021

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Sep 22, 2021

Proposed changes:

  • Replaced FARM import statements to use new, internal modules rather than FARM

  • Removed FARM dependency from requirements.txt

  • Added FARM dependencies (e.g. Pytorch) to haystack

  • Got rid of unused FARM dependencies:

    • boto3
    • dotmap
    • fasttext
    • sentencepiece
    • Werkzeug==0.16.1
    • flask
    • flask-restplus
    • flask-cors
  • Replaced FARM module names in logging statements, e.g., logging.getLogger('farm').setLevel(logging.WARNING) -> logging.getLogger('haystack.modeling').setLevel(logging.WARNING)

  • Added code from FARM

    • InferenceProcessor because of usage in _RetribertEmbeddingEncoder
    • TextClassificationProcessor because it's a parent class of InferenceProcessor
    • haystack/modeling/evaluation/squad_evaluation.py because of usage in haystack/eval.py
  • Removed FARMRanker so that there is no need for TextPairClassificationProcessor. (SentenceTransformersRanker remains unchanged.)

  • Various small changes in type annotations, e.g., model_name_or_path is str and not a Path. Let's dicsuss that.

  • _log_samples needs basket passed as parameter

  • fixed small bug with wrongly named parameter "max_sample" instead of "max_samples"

Limitations
Classifier node (FARMClassifier classifying documents not queries) and the corresponding test won't work without FARM's TextClassificationHead. I would suggest to drop the classifier node and re-implement at a later point if there is a strong need.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code

Related to #1433

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

julian-risch and others added 22 commits September 23, 2021 10:53
@julian-risch julian-risch marked this pull request as ready for review September 23, 2021 14:38
@julian-risch julian-risch changed the title WIP: Replace FARM import statements; add dependencies Replace FARM import statements; add dependencies Sep 23, 2021
@julian-risch
Copy link
Member Author

Not sure why the "Update Docstrings and Tutorials" step fails: https://github.com/deepset-ai/haystack/runs/3688713083?check_suite_focus=true#step:5:33
Maybe it's related to the deleted Classifier module and the deleted FARMRanker.

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I only found some minor things where I made inline comments.

How about we also rename the haystack.reader.farm.FARMReader into something less farmy? I would be fine to create an additional PR for this, so please create an issue if you do not want to tackle it here.

I am really unhappy that we have so much quite useless code (InferenceProcessor and therefore TextclassificationProcessor and therefore tokenize_with_metadata and other legacy code that we should get rid of soon) but I agree its better to merge this one quickly and work in follow up PRs on making haystack more beautiful :)

@@ -71,7 +71,7 @@ class SampleBasket:
is needed for tasks like question answering where the source text can generate multiple input - label
pairs."""

def __init__(self, id_internal: Union[int, str], raw: dict, id_external: str = None, samples: Optional[List[Sample]] = None):
def __init__(self, id_internal: Union[int, str, None], raw: dict, id_external: str = None, samples: Optional[List[Sample]] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Optional[Union[int, str]]

@@ -1185,7 +1525,7 @@ def write_squad_predictions(predictions, out_filename, predictions_filename=None
json.dump(predictions_json, open(out_filename, "w"))
logger.info(f"Written Squad predictions to: {out_filename}")

def _read_dpr_json(file, max_samples=None, proxies=None, num_hard_negatives=1, num_positives=1, shuffle_negatives=True, shuffle_positives=False):
def _read_dpr_json(file: str, max_samples: int = None, proxies: Any = None, num_hard_negatives: int = 1, num_positives: int = 1, shuffle_negatives: bool = True, shuffle_positives: bool = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can those type annotations be correct max_samples : int = None?

@@ -7,6 +7,7 @@
from typing import Callable, Dict, List, Optional, Tuple, Union, Generator
import json

from haystack.modeling.data_handler.processor import http_get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this here.

@@ -15,7 +15,7 @@
import traceback
import os
import requests
from farm.file_utils import http_get
from haystack.modeling.file_utils import http_get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed in this script

@@ -9,7 +9,7 @@
from haystack.reader.farm import FARMReader
from haystack.reader.transformers import TransformersReader
from haystack.utils import launch_milvus, launch_es, launch_opensearch
from farm.file_utils import http_get
from haystack.modeling.file_utils import http_get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_utils doesnt exist, use from haystack.modeling.data_handler.processor import http_get

@julian-risch julian-risch mentioned this pull request Sep 28, 2021
@julian-risch
Copy link
Member Author

Looks good. I only found some minor things where I made inline comments.

How about we also rename the haystack.reader.farm.FARMReader into something less farmy? I would be fine to create an additional PR for this, so please create an issue if you do not want to tackle it here.

I am really unhappy that we have so much quite useless code (InferenceProcessor and therefore TextclassificationProcessor and therefore tokenize_with_metadata and other legacy code that we should get rid of soon) but I agree its better to merge this one quickly and work in follow up PRs on making haystack more beautiful :)

I have created a new issue regarding the renaming of FARMReader: #1527 feel free to add any ideas for a new name there.

@Timoeller Timoeller self-requested a review September 28, 2021 14:19
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants