Skip to content

Commit

Permalink
[Examples] OCR Ray Datasets example (ray-project#25930)
Browse files Browse the repository at this point in the history
This is a simple example that shows how to do OCR with Ray Datasets. It includes:

- How to upload and download the dataset to and from S3
- How to run OCR on the dataset with tesseract
- How to use actors to keep around and re-use a spaCy context for doing NLP on the data

Co-authored-by: Clark Zinzow <[email protected]>
  • Loading branch information
pcmoritz and clarkzinzow committed Jul 6, 2022
1 parent ea47d97 commit 1ba8c8c
Show file tree
Hide file tree
Showing 4 changed files with 556 additions and 0 deletions.
2 changes: 2 additions & 0 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ parts:
title: Processing the NYC taxi dataset
- file: data/examples/big_data_ingestion
title: Large-scale ML Ingest
- file: data/examples/ocr_example
title: Scaling OCR with Ray Datasets
- file: data/faq
- file: data/package-ref
- file: data/integrations
Expand Down
9 changes: 9 additions & 0 deletions doc/source/data/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@ soon!).
:type: ref
:text: Processing NYC taxi data using Ray Datasets
:classes: btn-link btn-block stretched-link
---
:img-top: /images/ocr.jpg

+++
.. link-button:: ocr_example
:type: ref
:text: Optical character recognition using Ray Datasets
:classes: btn-link btn-block stretched-link


Scaling Out Datasets Workloads
------------------------------
Expand Down
Loading

0 comments on commit 1ba8c8c

Please sign in to comment.