Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Add transformers document classifier #175

Merged
merged 8 commits into from
Oct 1, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Create classifier.mdx
  • Loading branch information
julian-risch committed Oct 1, 2021
commit d1de84a2cc4a8b1e0f2e0585cd13e380e135faf2
43 changes: 43 additions & 0 deletions docs/latest/components/classifier.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Classifier

The Classifier Node is a transformer based classification model used to create predictions that can be attached to retrieved documents as metadata.
For example, by using a sentiment model, you can label each document as being either positive or negative in sentiment.
Through a tight integration with the HuggingFace model hub, you can easily load any classification model by simply supplying the model name.

![image](/img/classifier.png)

<div className="max-w-xl bg-yellow-light-theme border-l-8 border-yellow-dark-theme px-6 pt-6 pb-4 my-4 rounded-md dark:bg-yellow-900">

Note that the Classifier is different from the Query Classifier.
While the Query Classifier categorizes incoming queries in order to route them to different parts of the pipeline,
the Classifier is used to create classification labels that can be attached to retrieved documents as metadata.

</div>

## Usage

Initialize it as follows:

``` python
from haystack.classifier import FARMClassifier

classifier_model = 'textattack/bert-base-uncased-imdb'
classifier = FARMClassifier(model_name_or_path=classifier_model)
```

It slotted into a pipeline as follows:

``` python
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=classifier, name='Classifier', inputs=['Retriever'])
```

It can also be run in isolation:

``` python
documents = classifier.predict(
query="",
documents = [doc1, doc2, doc3, ...]
):
```