Skip to content

Commit

Permalink
Update and rename classifier.mdx to document_classifier.mdx
Browse files Browse the repository at this point in the history
  • Loading branch information
julian-risch committed Sep 30, 2021
1 parent 1ada823 commit f2ab8a3
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 43 deletions.
43 changes: 0 additions & 43 deletions docs/latest/components/classifier.mdx

This file was deleted.

53 changes: 53 additions & 0 deletions docs/latest/components/document_classifier.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Document Classifier

The TransformersDocumentClassifier Node is a transformer based classification model used to create predictions that can be attached to retrieved documents as metadata.
For example, by using a sentiment model, you can label each document as being either positive or negative in sentiment.
Through a tight integration with the HuggingFace model hub, you can easily load any classification model by simply supplying the model name.

![image](/img/classifier.png)

<div className="max-w-xl bg-yellow-light-theme border-l-8 border-yellow-dark-theme px-6 pt-6 pb-4 my-4 rounded-md dark:bg-yellow-900">

Note that the Document Classifier is different from the Query Classifier.
While the Query Classifier categorizes incoming queries in order to route them to different parts of the pipeline,
the Document Classifier is used to create classification labels that can be attached to retrieved documents as metadata.

</div>

## Usage

Initialize it as follows:

``` python
from haystack.document_classifier import TransformersDocumentClassifier

doc_classifier_model = 'bhadresh-savani/distilbert-base-uncased-emotion'
doc_classifier = TransformersDocumentClassifier(model_name_or_path=doc_classifier_model)
```

Alternatively, if you can't find a classification model that has been pre-trained for your exact classification task, you can use zero-shot classification with a custom list of labels as follows:

``` python
doc_classifier_model = 'cross-encoder/nli-distilroberta-base'
doc_classifier = TransformersDocumentClassifier(
model_name_or_path=doc_classifier_model,
task="zero-shot-classification",
labels=["negative", "positive"]
```


It is slotted into a pipeline as follows:

``` python
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=doc_classifier, name='DocClassifier', inputs=['Retriever'])
```

It can also be run in isolation:

``` python
documents = doc_classifier.predict(
documents = [doc1, doc2, doc3, ...]
):
```

0 comments on commit f2ab8a3

Please sign in to comment.