Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Docs V0.10 #164

Merged
merged 7 commits into from
Sep 23, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Document new converters
  • Loading branch information
brandenchan committed Sep 20, 2021
commit 98f00125132094a98336e4ac4e3dd03e2a6dafb8
33 changes: 32 additions & 1 deletion docs/latest/components/preprocessing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,17 @@ Please refer to [the API docs](/reference/file-converters) to see which converte
valid_languages=["de","en"])
</code>
<code>doc = converter.convert(file_path=file, meta=None)</code>
<code>
# Alternatively, if you have a PDF containing images, Haystack uses tessaract under the hood to OCR image PDFs.
</code>
<code>
from haystack.file_converter import PDFToTextOCRConverter
</code>
<code>
converter = PDFToTextOCRConverter(remove_numeric_tables=False,
valid_languages=["deu","eng"])
</code>
<code>doc = converter.convert(file_path=file, meta=None)</code>
</pre>
),
},
Expand All @@ -71,7 +82,7 @@ Please refer to [the API docs](/reference/file-converters) to see which converte
content: (
<div>
<p>
Haystack also has a`convert_files_to_dicts()` utility function that
Haystack also has a `convert_files_to_dicts()` utility function that
will convert all txt or pdf files in a given folder into this
dictionary format.
</p>
Expand All @@ -84,6 +95,26 @@ Please refer to [the API docs](/reference/file-converters) to see which converte
</div>
),
},
{
title: "Image",
content: (
<div>
<p>
Haystack supports extraction of text from images using OCR.
</p>
<pre>
<code>
from haystack.file_converter import ImageToTextConverter
</code>
<code>
converter = ImageToTextConverter(remove_numeric_tables=True,
valid_languages=["de","en"])
</code>
<code>doc = converter.convert(file_path=file, meta=None)</code>
</pre>
</div>
),
},
]}
/>

Expand Down