-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to handle new lines with proposed fix #235
Comments
Hi @progressEdd, thanks for the issue report. Here's the api reference: https://eyurtsev.github.io/kor/generated/kor.html#kor.create_extraction_chain You can do something like this: # For CSV encoding
chain = create_extraction_chain(llm, node, encoder_or_encoder_class="csv", input_formatter="triple_quotes")
# For JSON encoding
chain = create_extraction_chain(llm, node, encoder_or_encoder_class="json",
input_formatter="triple_quotes") or pass in a callable to apply whatever formatting you want. Another trick is to collapse a lot of contiguous whitespace -- it'll improve the results and reduce the token count. |
gotcha I must have missed it, I was looking at the example code pages, and didn't look too in depth into the module page. I'll have to look into
which was referenced by |
Does CSV encoder offer advantages (in token usage or llm performance) over passing the data as a plain text and trimming white space? The encoders are used to encode the desired It tells the LLM how to structure its output so that the output can be parsed into structured representation. JSON is more flexible and supports more complex structured data representations. It also uses more tokens. To get a sense of what's going, you follow the tutorial and try out both encoders and print out the prompt into the LLM: |
gotcha that makes sense, I thought the csv encoder had the capability of truncating/processing input text. I'll rely on the the code I shared earlier |
When a string with multiple line breaks is passed into a chain, kor is unable to parse the raw text into the schema.
Solution: adding delimiters such as
```
between the text improves extraction. You can see it with the example at the endfor example
will return
only the name is returned to the dictionary, but it will show up in the raw
for further debugging here's the output of
chain.prompt.format_prompt(text=test_lots_of_new_lines).to_string()
, to work with github formatting, I removed the codeblock from the prompttWhen I add delimiters, this is what I am able to get mainstreet
chain.run(text="```"+test_lots_of_new_lines+"```")
The text was updated successfully, but these errors were encountered: