Skip to content

TuanaCelik/unstructuredio-haystack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Unstructured Haystack

PyPI - Version PyPI - Python Version


Unstructured Connectors for Haystack

This is an example Haystack 2.0 integration. It's an integration for Unstructured.io connectors. Please contribute ๐Ÿš€

The current version has 2 available Unstructured connectors:

  • Discord: UnstructuredDiscordConnector
  • GitHub: UnstructuredGitHubConnector
  • Google Drive: UnstructuredGoogleDriveConnector

How to use in a Haystack 2.0 Pipeline

For example, you can write documents fetched from Discord using the UnstructuredDiscordConnector:

from haystack.preview import Pipeline
from haystack.preview.components.writers import DocumentWriter
from unstructured_haystack import UnstructuredDiscordConnector
from chroma_haystack import ChromaDocumentStore

# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore()
connector = UnstructuredDiscordConnector(api_key="UNSTRUCTURED_API_KEY", discord_token="DISCORD_TOKEN")

indexing = Pipeline()
indexing.add_component("connector", connector)
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("connector.documents", "writer.documents")
indexing.run({"connector": {"channels" : "993539071815200889", "period": 3, "output_dir" : "discord-example"}})