-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Convertor : Pipeline Component #7784
Comments
@julian-risch Thanks, |
@arminnajafi I have been contributing for a month or so, and they do not normally assign external people to issues, or at least that I have seen. If you want to do this one, feel free to open a PR, they will review it when you are ready. |
I'm proposing a Example JSON schema (prize.json): {
"prizes": [
{
"year": "string",
"category": "string",
"laureates": [
{
"id": "string",
"firstname": "string",
"surname": "string",
"motivation": "string",
"share": "string",
}
],
}
]
} proposed implementation from haystack.components.converters import JSONToDocument
converter = JSONToDocument(
jq_schema=".prizes[].laureates[]?",
content_key="motivation",
additional_meta_fields=["firstname", "surname", "share"],
)
docs = converter.run(sources=["./prize.json"])
print(docs["documents"][0]) expected output:
|
Based on the @kanenorman suggestion, I realized a basic implementation of a JSONToDocument component in #8079. In this first implementation, I have not yet included the Let me know how this component can be improved to include this logic. |
@tradicio - Thank you. I'm working on incorporating the jq logic. Are you planning on leaving your PR up as final or converting to draft? |
I'm not sure how much I'll be able to work on the PR in the next few weeks, if you think you can incorporate the jq logic I'm more than happy to make the PR become draft |
this component would be game changer. i try hard to find a solution for old but gold data tables :) like csv etc. @tradicio could you point me to the right direction where i can find more information about this topic? |
For more information on how the logic behind jq works, I recommend you start with the official documentation. Regarding the structure of the component, I have been inspired by the JSONLoader component in LangChain, as suggested by @kanenorman. With respect to your second question, I think this JSONToDocument component can be also a first step to work on tabular data but I would still keep separate any future components (such as the one I suggested in #8036). CSV and XLSX files are often used to collect data with specific structures compared to JSON files. It seems to me that they are used in different contexts and for different purposes so they need different processor components. |
Is your feature request related to a problem? Please describe.
Currently we have a
.txt
toDocument
convertor besides others and unstructured. But I see most of the data we deal with are in the form of JSON.Describe the solution you'd like
So a
.json
toDocument
convertor will be a bread winner while consuming API data in pipelines.Describe alternatives you've considered
Unstructured file convertor is present but JSON schema as a individual convertor adds more sense and value.
The text was updated successfully, but these errors were encountered: