Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Excel files #258

Closed
ImadSaddik opened this issue Nov 6, 2024 · 11 comments · Fixed by #334
Closed

Support Excel files #258

ImadSaddik opened this issue Nov 6, 2024 · 11 comments · Fixed by #334
Assignees
Labels
enhancement New feature or request

Comments

@ImadSaddik
Copy link

Hello,

First of all, thank you for open-sourcing this fantastic project. It already offers a lot in its current state. I have a feature request: would it be possible to add support for Excel files in the near future?

I believe this would make the library even more complete. While there are some areas that could use improvement, I’m confident things will keep getting better over time. I’d love to hear your thoughts on this, and perhaps you're already considering Excel file support.

Thanks again,
SADDIK Imad

@ImadSaddik ImadSaddik added the enhancement New feature or request label Nov 6, 2024
@ViCtOr-dev13
Copy link

Hello @ImadSaddik , Did you find a way to extract informations from excel file ? Does it possible to convert it into html or pdf to process it ?

@ImadSaddik
Copy link
Author

Hi @ViCtOr-dev13, so far docling does not support Excel files. If you want, you can use LangChain to load the parse the Excel docs, but I don't have a lot of experience with that.

@psychicDivine
Copy link

@ViCtOr-dev13 , there are multiple options available. I'm not sure about your specific use case, but you could consider using Langchain's document loaders or Llama Index's readers like DocxReader (https://docs.llamaindex.ai/en/stable/api_reference/readers/file/#llama_index.readers.file.DocxReader).

@PeterStaar-IBM
Copy link
Contributor

We need to leverage the openpyx library.

@ImadSaddik
Copy link
Author

Indeed, it will be challenging to cover all cases but if we can have something that improves overtime that is going to be good 😊

@PeterStaar-IBM
Copy link
Contributor

@ImadSaddik Feel free to start with the implementation. I could also start with a simple backend and then we collaborate.

@ImadSaddik
Copy link
Author

Sounds good, let's do it 👍🏻

@PeterStaar-IBM
Copy link
Contributor

@ImadSaddik I started something in this PR: #334

@ImadSaddik
Copy link
Author

Thank you @PeterStaar-IBM for letting me know. I have been busy with work lately, I will look into it once I get the time.

@PeterStaar-IBM
Copy link
Contributor

@ImadSaddik Just waiting for a review now on PR: #334 , should be in sometime next week!

FYI: @dolfim-ibm @cau-git

@ImadSaddik
Copy link
Author

@PeterStaar-IBM, I will test what you did and provide feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants