Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Add transformers document classifier #175

Merged
merged 8 commits into from
Oct 1, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update and rename classifier.mdx to document_classifier.mdx
  • Loading branch information
julian-risch committed Sep 30, 2021
commit f2ab8a3fe1b108fc73fa6b844b019fc1d78b6a31
43 changes: 0 additions & 43 deletions docs/latest/components/classifier.mdx

This file was deleted.

53 changes: 53 additions & 0 deletions docs/latest/components/document_classifier.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Document Classifier

The TransformersDocumentClassifier Node is a transformer based classification model used to create predictions that can be attached to retrieved documents as metadata.
For example, by using a sentiment model, you can label each document as being either positive or negative in sentiment.
Through a tight integration with the HuggingFace model hub, you can easily load any classification model by simply supplying the model name.

![image](/img/classifier.png)

<div className="max-w-xl bg-yellow-light-theme border-l-8 border-yellow-dark-theme px-6 pt-6 pb-4 my-4 rounded-md dark:bg-yellow-900">

Note that the Document Classifier is different from the Query Classifier.
While the Query Classifier categorizes incoming queries in order to route them to different parts of the pipeline,
the Document Classifier is used to create classification labels that can be attached to retrieved documents as metadata.

</div>

## Usage

Initialize it as follows:

``` python
from haystack.document_classifier import TransformersDocumentClassifier

doc_classifier_model = 'bhadresh-savani/distilbert-base-uncased-emotion'
doc_classifier = TransformersDocumentClassifier(model_name_or_path=doc_classifier_model)
```

Alternatively, if you can't find a classification model that has been pre-trained for your exact classification task, you can use zero-shot classification with a custom list of labels as follows:

``` python
doc_classifier_model = 'cross-encoder/nli-distilroberta-base'
doc_classifier = TransformersDocumentClassifier(
model_name_or_path=doc_classifier_model,
task="zero-shot-classification",
labels=["negative", "positive"]
```


It is slotted into a pipeline as follows:

``` python
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=doc_classifier, name='DocClassifier', inputs=['Retriever'])
```

It can also be run in isolation:

``` python
documents = doc_classifier.predict(
documents = [doc1, doc2, doc3, ...]
):
```