This repository has been archived by the owner on Oct 20, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #175 from deepset-ai/transformers-document-classifier
Add transformers document classifier
- Loading branch information
Showing
4 changed files
with
60 additions
and
48 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Document Classifier | ||
|
||
The TransformersDocumentClassifier Node is a transformer based classification model used to create predictions that can be attached to retrieved documents as metadata. | ||
For example, by using a sentiment model, you can label each document as being either positive or negative in sentiment. | ||
Through a tight integration with the HuggingFace model hub, you can easily load any classification model by simply supplying the model name. | ||
|
||
![image](/img/classifier.png) | ||
|
||
<div className="max-w-xl bg-yellow-light-theme border-l-8 border-yellow-dark-theme px-6 pt-6 pb-4 my-4 rounded-md dark:bg-yellow-900"> | ||
|
||
Note that the Document Classifier is different from the Query Classifier. | ||
While the Query Classifier categorizes incoming queries in order to route them to different parts of the pipeline, | ||
the Document Classifier is used to create classification labels that can be attached to retrieved documents as metadata. | ||
|
||
</div> | ||
|
||
## Usage | ||
|
||
Initialize it as follows: | ||
|
||
``` python | ||
from haystack.document_classifier import TransformersDocumentClassifier | ||
|
||
doc_classifier_model = 'bhadresh-savani/distilbert-base-uncased-emotion' | ||
doc_classifier = TransformersDocumentClassifier(model_name_or_path=doc_classifier_model) | ||
``` | ||
|
||
Alternatively, if you can't find a classification model that has been pre-trained for your exact classification task, you can use zero-shot classification with a custom list of labels and a Natural language Inference (NLI) model as follows: | ||
|
||
``` python | ||
doc_classifier_model = 'cross-encoder/nli-distilroberta-base' | ||
doc_classifier = TransformersDocumentClassifier( | ||
model_name_or_path=doc_classifier_model, | ||
task="zero-shot-classification", | ||
labels=["negative", "positive"] | ||
``` | ||
|
||
|
||
It is slotted into a pipeline as follows: | ||
|
||
``` python | ||
pipeline = Pipeline() | ||
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) | ||
pipeline.add_node(component=doc_classifier, name='DocClassifier', inputs=['Retriever']) | ||
``` | ||
|
||
It can also be run in isolation: | ||
|
||
``` python | ||
documents = doc_classifier.predict( | ||
documents = [doc1, doc2, doc3, ...] | ||
): | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2759c9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs: