deepset-ai · julian-risch · Sep 28, 2021 · Sep 22, 2021 · Sep 22, 2021 · Sep 23, 2021
diff --git a/docs/_src/api/api/classifier.md b/docs/_src/api/api/classifier.md
diff --git a/docs/_src/api/api/generate_docstrings.sh b/docs/_src/api/api/generate_docstrings.sh
@@ -17,5 +17,4 @@ pydoc-markdown pydoc-markdown-graph-retriever.yml
 pydoc-markdown pydoc-markdown-evaluation.yml
 pydoc-markdown pydoc-markdown-ranker.yml
 pydoc-markdown pydoc-markdown-question-generator.yml
-pydoc-markdown pydoc-markdown-classifier.yml
 
diff --git a/docs/_src/api/api/pydoc-markdown-classifier.yml b/docs/_src/api/api/pydoc-markdown-classifier.yml
diff --git a/docs/_src/api/api/pydoc-markdown-ranker.yml b/docs/_src/api/api/pydoc-markdown-ranker.yml
@@ -1,7 +1,7 @@
 loaders:
   - type: python
     search_path: [../../../../haystack/ranker]
-    modules: ['base', 'farm']
+    modules: ['base', 'sentence_transformers']
     ignore_when_discovered: ['__init__']
 processor:
   - type: filter

diff --git a/docs/_src/api/api/ranker.md b/docs/_src/api/api/ranker.md
@@ -51,130 +51,51 @@ position in the ranking of documents the correct document is.
 - `return_preds`: Whether to add predictions in the returned dictionary. If True, the returned dictionary
                      contains the keys "predictions" and "metrics".
 
-<a name="farm"></a>
-# Module farm
+<a name="sentence_transformers"></a>
+# Module sentence\_transformers
 
-<a name="farm.FARMRanker"></a>
-## FARMRanker Objects
+<a name="sentence_transformers.SentenceTransformersRanker"></a>
+## SentenceTransformersRanker Objects
 
 ```python
-class FARMRanker(BaseRanker)
+class SentenceTransformersRanker(BaseRanker)
 ```
 
-Transformer based model for Document Re-ranking using the TextPairClassifier of FARM framework (https://github.com/deepset-ai/FARM).
+Sentence Transformer based pre-trained Cross-Encoder model for Document Re-ranking (https://huggingface.co/cross-encoder).
 Re-Ranking can be used on top of a retriever to boost the performance for document search. This is particularly useful if the retriever has a high recall but is bad in sorting the documents by relevance.
-While the underlying model can vary (BERT, Roberta, DistilBERT, ...), the interface remains the same.
-FARMRanker handles Cross-Encoder models that internally use two logits and output the classifier's probability of label "1" as similarity score.
-This includes TextPairClassification models trained within FARM.
-In contrast, SentenceTransformersRanker handles Cross-Encoder models that use a single logit as similarity score.
+
+SentenceTransformerRanker handles Cross-Encoder models that use a single logit as similarity score.
 https://www.sbert.net/docs/pretrained-models/ce-msmarco.html#usage-with-transformers
+In contrast, FARMRanker handles Cross-Encoder models that internally use two logits and output the classifier's probability of label "1" as similarity score.
+This includes TextPairClassification models trained within FARM.
 
-|  With a FARMRanker, you can:
+|  With a SentenceTransformersRanker, you can:
  - directly get predictions via predict()
- - fine-tune the model on TextPair data via train()
 
 Usage example:
 ...
 retriever = ElasticsearchRetriever(document_store=document_store)
-ranker = FARMRanker(model_name_or_path="deepset/gbert-base-germandpr-reranking")
+ranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")
 p = Pipeline()
 p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])
 p.add_node(component=ranker, name="Ranker", inputs=["ESRetriever"])
 
-<a name="farm.FARMRanker.__init__"></a>
+<a name="sentence_transformers.SentenceTransformersRanker.__init__"></a>
 #### \_\_init\_\_
 
 ```python
- | __init__(model_name_or_path: Union[str, Path], model_version: Optional[str] = None, batch_size: int = 50, use_gpu: bool = True, top_k: int = 10, num_processes: Optional[int] = None, max_seq_len: int = 256, progress_bar: bool = True)
+ | __init__(model_name_or_path: Union[str, Path], model_version: Optional[str] = None, top_k: int = 10)
 ```
 
 **Arguments**:
 
-- `model_name_or_path`: Directory of a saved model or the name of a public model e.g. 'bert-base-cased',
-'deepset/bert-base-cased-squad2', 'deepset/bert-base-cased-squad2', 'distilbert-base-uncased-distilled-squad'.
-See https://huggingface.co/models for full list of available models.
+- `model_name_or_path`: Directory of a saved model or the name of a public model e.g.
+'cross-encoder/ms-marco-MiniLM-L-12-v2'.
+See https://huggingface.co/cross-encoder for full list of available models
 - `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
-- `batch_size`: Number of samples the model receives in one batch for inference.
-                   Memory consumption is much lower in inference mode. Recommendation: Increase the batch size
-                   to a value so only a single batch is used.
-- `use_gpu`: Whether to use GPU (if available)
 - `top_k`: The maximum number of documents to return
-- `num_processes`: The number of processes for `multiprocessing.Pool`. Set to value of 0 to disable
-                      multiprocessing. Set to None to let Inferencer determine optimum number. If you
-                      want to debug the Language Model, you might need to disable multiprocessing!
-- `max_seq_len`: Max sequence length of one input text for the model
-- `progress_bar`: Whether to show a tqdm progress bar or not.
-                     Can be helpful to disable in production deployments to keep the logs clean.
-
-<a name="farm.FARMRanker.train"></a>
-#### train
-
-```python
- | train(data_dir: str, train_filename: str, dev_filename: Optional[str] = None, test_filename: Optional[str] = None, use_gpu: Optional[bool] = None, batch_size: int = 10, n_epochs: int = 2, learning_rate: float = 1e-5, max_seq_len: Optional[int] = None, warmup_proportion: float = 0.2, dev_split: float = 0, evaluate_every: int = 300, save_dir: Optional[str] = None, num_processes: Optional[int] = None, use_amp: str = None)
-```
-
-Fine-tune a model on a TextPairClassification dataset. Options:
-
-- Take a plain language model (e.g. `bert-base-cased`) and train it for TextPairClassification
-- Take a TextPairClassification model and fine-tune it for your domain
-
-**Arguments**:
-
-- `data_dir`: Path to directory containing your training data
-- `train_filename`: Filename of training data
-- `dev_filename`: Filename of dev / eval data
-- `test_filename`: Filename of test data
-- `dev_split`: Instead of specifying a dev_filename, you can also specify a ratio (e.g. 0.1) here
-                  that gets split off from training data for eval.
-- `use_gpu`: Whether to use GPU (if available)
-- `batch_size`: Number of samples the model receives in one batch for training
-- `n_epochs`: Number of iterations on the whole training data set
-- `learning_rate`: Learning rate of the optimizer
-- `max_seq_len`: Maximum text length (in tokens). Everything longer gets cut down.
-- `warmup_proportion`: Proportion of training steps until maximum learning rate is reached.
-                          Until that point LR is increasing linearly. After that it's decreasing again linearly.
-                          Options for different schedules are available in FARM.
-- `evaluate_every`: Evaluate the model every X steps on the hold-out eval dataset
-- `save_dir`: Path to store the final model
-- `num_processes`: The number of processes for `multiprocessing.Pool` during preprocessing.
-                      Set to value of 1 to disable multiprocessing. When set to 1, you cannot split away a dev set from train set.
-                      Set to None to use all CPU cores minus one.
-- `use_amp`: Optimization level of NVIDIA's automatic mixed precision (AMP). The higher the level, the faster the model.
-                Available options:
-                None (Don't use AMP)
-                "O0" (Normal FP32 training)
-                "O1" (Mixed Precision => Recommended)
-                "O2" (Almost FP16)
-                "O3" (Pure FP16).
-                See details on: https://nvidia.github.io/apex/amp.html
-
-**Returns**:
-
-None
-
-<a name="farm.FARMRanker.update_parameters"></a>
-#### update\_parameters
-
-```python
- | update_parameters(max_seq_len: Optional[int] = None)
-```
-
-Hot update parameters of a loaded Ranker. It may not to be safe when processing concurrent requests.
-
-<a name="farm.FARMRanker.save"></a>
-#### save
-
-```python
- | save(directory: Path)
-```
-
-Saves the Ranker model so that it can be reused at a later point in time.
-
-**Arguments**:
-
-- `directory`: Directory where the Ranker model should be saved
 
-<a name="farm.FARMRanker.predict_batch"></a>
+<a name="sentence_transformers.SentenceTransformersRanker.predict_batch"></a>
 #### predict\_batch
 
 ```python
@@ -195,7 +116,7 @@ Returns list of dictionary of query and list of document sorted by (desc.) simil
 
 List of dictionaries containing query and ranked list of Document
 
-<a name="farm.FARMRanker.predict"></a>
+<a name="sentence_transformers.SentenceTransformersRanker.predict"></a>
 #### predict
 
 ```python