-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace FARM import statements; add dependencies #1492
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…stack into farm_merging_dependencies
…stack into farm_merging_dependencies
…stack into farm_merging_dependencies
…stack into farm_merging_dependencies
Not sure why the "Update Docstrings and Tutorials" step fails: https://github.com/deepset-ai/haystack/runs/3688713083?check_suite_focus=true#step:5:33 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I only found some minor things where I made inline comments.
How about we also rename the haystack.reader.farm.FARMReader into something less farmy? I would be fine to create an additional PR for this, so please create an issue if you do not want to tackle it here.
I am really unhappy that we have so much quite useless code (InferenceProcessor and therefore TextclassificationProcessor and therefore tokenize_with_metadata and other legacy code that we should get rid of soon) but I agree its better to merge this one quickly and work in follow up PRs on making haystack more beautiful :)
@@ -71,7 +71,7 @@ class SampleBasket: | |||
is needed for tasks like question answering where the source text can generate multiple input - label | |||
pairs.""" | |||
|
|||
def __init__(self, id_internal: Union[int, str], raw: dict, id_external: str = None, samples: Optional[List[Sample]] = None): | |||
def __init__(self, id_internal: Union[int, str, None], raw: dict, id_external: str = None, samples: Optional[List[Sample]] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be Optional[Union[int, str]]
@@ -1185,7 +1525,7 @@ def write_squad_predictions(predictions, out_filename, predictions_filename=None | |||
json.dump(predictions_json, open(out_filename, "w")) | |||
logger.info(f"Written Squad predictions to: {out_filename}") | |||
|
|||
def _read_dpr_json(file, max_samples=None, proxies=None, num_hard_negatives=1, num_positives=1, shuffle_negatives=True, shuffle_positives=False): | |||
def _read_dpr_json(file: str, max_samples: int = None, proxies: Any = None, num_hard_negatives: int = 1, num_positives: int = 1, shuffle_negatives: bool = True, shuffle_positives: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can those type annotations be correct max_samples : int = None
?
haystack/preprocessor/utils.py
Outdated
@@ -7,6 +7,7 @@ | |||
from typing import Callable, Dict, List, Optional, Tuple, Union, Generator | |||
import json | |||
|
|||
from haystack.modeling.data_handler.processor import http_get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this here.
test/benchmarks/retriever.py
Outdated
@@ -15,7 +15,7 @@ | |||
import traceback | |||
import os | |||
import requests | |||
from farm.file_utils import http_get | |||
from haystack.modeling.file_utils import http_get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed in this script
test/benchmarks/utils.py
Outdated
@@ -9,7 +9,7 @@ | |||
from haystack.reader.farm import FARMReader | |||
from haystack.reader.transformers import TransformersReader | |||
from haystack.utils import launch_milvus, launch_es, launch_opensearch | |||
from farm.file_utils import http_get | |||
from haystack.modeling.file_utils import http_get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_utils doesnt exist, use from haystack.modeling.data_handler.processor import http_get
I have created a new issue regarding the renaming of FARMReader: #1527 feel free to add any ideas for a new name there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Proposed changes:
Replaced FARM import statements to use new, internal modules rather than FARM
Removed FARM dependency from requirements.txt
Added FARM dependencies (e.g. Pytorch) to haystack
Got rid of unused FARM dependencies:
Replaced FARM module names in logging statements, e.g.,
logging.getLogger('farm').setLevel(logging.WARNING)
->logging.getLogger('haystack.modeling').setLevel(logging.WARNING)
Added code from FARM
Removed FARMRanker so that there is no need for TextPairClassificationProcessor. (SentenceTransformersRanker remains unchanged.)
Various small changes in type annotations, e.g., model_name_or_path is str and not a Path. Let's dicsuss that.
_log_samples needs basket passed as parameter
fixed small bug with wrongly named parameter "max_sample" instead of "max_samples"
Limitations
Classifier node (FARMClassifier classifying documents not queries) and the corresponding test won't work without FARM's TextClassificationHead. I would suggest to drop the classifier node and re-implement at a later point if there is a strong need.
Status (please check what you already did):
Related to #1433