Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonSchemaValidator: unintended primitive value conversion in _recursive_json_to_object method #7457

Closed
vblagoje opened this issue Apr 3, 2024 · 0 comments · Fixed by #7556
Assignees
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint

Comments

@vblagoje
Copy link
Member

vblagoje commented Apr 3, 2024

Describe the bug

The current implementation of the _recursive_json_to_object method within the JsonSchemaValidator class inadvertently converts primitive string values to their respective data types (e.g., converting numeric strings to integers) when processing JSON content. This behavior occurs during the json.loads(value) step, where the method does not distinguish between primitive values and JSON objects or arrays. As a result, string values that represent numeric or boolean data are automatically converted to their corresponding data types, leading to potential mismatches with the expected data types defined in the JSON schema.

Expected behavior

The correct behavior should ensure that the _recursive_json_to_object method maintains the original data types of primitive values as specified in the input JSON content. This requires the method to identify and preserve primitive string values (e.g., numeric strings, boolean strings) without converting them to other data types during the parsing process. Only non-primitive values (i.e., values that represent actual JSON objects or arrays) should be parsed and converted into their respective complex types. This approach will ensure that the JSON content's integrity is maintained, and schema validations are performed accurately according to the specified data types.

Additional context

The issue highlights the need for a more nuanced parsing mechanism within the _recursive_json_to_object method that can accurately differentiate between primitive and non-primitive values. This distinction is critical for applications that rely on strict data type validations against a JSON schema, where preserving the original data type of each value is essential for successful validation.

A possible solution involves enhancing the parsing logic to check the result of json.loads(value) and only proceed with converting the value if it is indeed a non-primitive data type (i.e., a dictionary or list). If the parsed value is a primitive data type (e.g., integer, float, boolean), the original string value should be retained. Implementing this solution will address the unintended data type conversion issue, thereby improving the functionality and reliability of the JsonSchemaValidator in handling JSON content with strict type requirements.

@vblagoje vblagoje added 2.x Related to Haystack v2.0 P2 Medium priority, add to the next sprint if no P1 available labels Apr 3, 2024
@shadeMe shadeMe added P1 High priority, add to the next sprint and removed P2 Medium priority, add to the next sprint if no P1 available labels Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint
Projects
None yet
3 participants