-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[suggestion]use jsoncomment instead of json in decode #182
Comments
Hi @bohea thanks for the suggestion! I am open to adding another encoder that uses jsoncomment. Two questions:
Is JSON comment is maintained anymore: https://pypi.org/project/jsoncomment/#data The project page leads to a 404. Let me know if you're able to find a project page or any other well maintained library that offers this functionality. If not, jsoncomment implementation could be added directly to kor using lark. (Let me know if you have interest to work on this.) An alternative approach would be to introduce another decoder that first strips away comments using a regexp and then delegates to JSON. |
it's not about comments in json , but extra commas in llm's response json which fails in json decode |
I don't find project page yet, it seems the author only upload package to pypi |
by the way, I have found that removing the json tag in schema encode makes json decode work better, because llm raw response often miss the json tag("</json>") at the end(not due to token restrictions) e.g.
json decoder will unwarp json tag first using a regexp, but if no </json> at the end, regexp match nothing, so decode fails |
@bohea apologies for delayed responses -- on vacation until end of July so i only have limited computer access. A few questions:
My personal experience:
I unfortunately don't have any benchmark datasets, so all of my conclusions should be treated as anecdotal, but based on my experience I don't want to change the default behavior of including the tag without quantitative evidence that it improves results. We should definitely make the presence of tag controllable by a flag though -- it will allow the user to determine how the data should be encoded.
This sounds like it could improve extraction in some cases and make it worse in other cases (extracting incorrect information). Is this not the case? |
@eyurtsev thanks for your response do you have benchmarking results? -- not yet, I did a simple statistic,chatgpt has a 50% chance of not adding the json tag, so I simply make use_tag = False when do json encode, and use_tag = True when do csv encode You were right to ask llm to add csv/json tag, it's chatgpt's problem don't follow the instruction(may be my text is too long) |
llm has a chance of giving the wrong json format, the most common of which is the addition of extra commas
example:
if use package jsoncomment
The text was updated successfully, but these errors were encountered: