Add support for Latex/Math/Equations #531

Vilhelm-Ian · 2023-10-03T05:13:08Z

Describe your problem:

while studying math from pdfs it would be nice to be able to copy equations

Solution you'd like to see:

train the model on math equations

Alternatives you considered:

No response

Additional information or remarks:

No response

dynobo · 2023-10-03T12:28:34Z

Hi @Vilhelm-Ian, thanks for you feature request!

TLDR;

I would love to see this integrated in NormCap, but due to it's complexity, I probably won't have enough time to work on this my own. But I'm definitely open for contributions here.

Some background

Tesseract, the OCR framework I leverage in NormCap, initially had some support for detecting equations. But its results were quite weak, so it got abandoned. I doubt, that it is now feasible to train a Tesseract model for decent math detection.

But it definitely would be possible to integrate an additional OCR framework into NormCap, which is optimized for LaTeX/Equations. Some open source frameworks actually deliver quite promising results, e.g. pix2text or LaTeX-OCR.

However, the difficulty is to find one that satisfies non-functional requirements by NormCap:

Feasible packaging for all system/platforms (macOS/Linux/Windows, x64/M1)
Few dependencies (in terms of numbers and file size)
100% offline (except maybe for model downloading)

Unfortunately, this probably rules out all torch or tensorflow based solutions, as packaging and dependencies are likely a nightmare. With also online-services ruled out, I'm not aware of any framework satisfying those requirements. However, in theory it should be possible to transform a torch/tensorflow model into an agnostic format like ONNX and use a much leaner runtime for inference. I'm just not aware of any maintained project that does this.

Those are just some initial thought, I'm interested to read opinions by others! 🙂

Vilhelm-Ian added enhancement New feature or request triage Needs confirmation and priotization labels Oct 3, 2023

dynobo added help wanted Looking for contributors to work on this issue ocr and removed triage Needs confirmation and priotization labels Oct 3, 2023

dynobo changed the title ~~Latex support~~ Add support for Latex/Math/Equations Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Latex/Math/Equations #531

Add support for Latex/Math/Equations #531

Vilhelm-Ian commented Oct 3, 2023

dynobo commented Oct 3, 2023 •

edited

Loading

Add support for Latex/Math/Equations #531

Add support for Latex/Math/Equations #531

Comments

Vilhelm-Ian commented Oct 3, 2023

Describe your problem:

Solution you'd like to see:

Alternatives you considered:

Additional information or remarks:

dynobo commented Oct 3, 2023 • edited Loading

TLDR;

Some background

dynobo commented Oct 3, 2023 •

edited

Loading