Skip to content

Commit

Permalink
Fix transformers version bugs
Browse files Browse the repository at this point in the history
  • Loading branch information
VikParuchuri committed Jun 30, 2024
1 parent dae479a commit 4cf1d08
Show file tree
Hide file tree
Showing 7 changed files with 457 additions and 429 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ There's a hosted API for marker available [here](https://www.datalab.to/):

- Supports PDFs, word documents, and powerpoints
- 1/4th the price of leading cloud-based competitors
- Uses [modal](https://modal.com/) for high reliability without latency spikes
- Leverages [Modal](https://modal.com/) for high reliability without latency spikes

# Community

Expand Down
1 change: 1 addition & 0 deletions convert.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os

os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" # For some reason, transformers decided to use .isin for a simple op, which is not supported on MPS
os.environ["IN_STREAMLIT"] = "true" # Avoid multiprocessing inside surya
os.environ["PDFTEXT_CPU_WORKERS"] = "1" # Avoid multiprocessing inside pdftext

Expand Down
3 changes: 2 additions & 1 deletion convert_single.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import pypdfium2 # Needs to be at the top to avoid warnings
import argparse
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" # For some reason, transformers decided to use .isin for a simple op, which is not supported on MPS

import argparse
from marker.convert import convert_single_pdf
from marker.logger import configure_logging
from marker.models import load_all_models
Expand Down
4 changes: 4 additions & 0 deletions marker/convert.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # Filter torch pytree user warnings

import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" # For some reason, transformers decided to use .isin for a simple op, which is not supported on MPS


import pypdfium2 as pdfium # Needs to be at the top to avoid warnings
from PIL import Image

Expand Down
4 changes: 4 additions & 0 deletions marker/models.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" # For some reason, transformers decided to use .isin for a simple op, which is not supported on MPS


from marker.postprocessors.editor import load_editing_model
from surya.model.detection import segformer
from texify.model.model import load_model as load_texify_model
Expand Down
866 changes: 442 additions & 424 deletions poetry.lock

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "marker-pdf"
version = "0.2.14"
version = "0.2.15"
description = "Convert PDF to markdown with high speed and accuracy."
authors = ["Vik Paruchuri <[email protected]>"]
readme = "README.md"
Expand Down Expand Up @@ -30,9 +30,9 @@ torch = "^2.2.2" # Issue with torch 2.3.0 and vision models - https://github.com
tqdm = "^4.66.1"
tabulate = "^0.9.0"
ftfy = "^6.1.1"
texify = "^0.1.9"
texify = "^0.1.10"
rapidfuzz = "^3.8.1"
surya-ocr = "^0.4.12"
surya-ocr = "^0.4.14"
filetype = "^1.2.0"
regex = "^2024.4.28"
pdftext = "^0.3.10"
Expand Down

0 comments on commit 4cf1d08

Please sign in to comment.