You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to use kor with opensource openchat model(https://huggingface.co/openchat/openchat-3.5-0106). I know that this model has a certain suffix for prompt. Here is an example below: GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
My current code:
import torch
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = "openchat/openchat-3.5-0106"
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = pipeline(
"text-generation", #task
model=model,
tokenizer=tokenizer,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.float16,
max_length=1000,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
hf = HuggingFacePipeline(pipeline=pipeline, model_kwargs={'temperature':0})
from langchain.prompts import PromptTemplate
INSTRUCTION_TEMPLATE = PromptTemplate(
input_variables=["type_description", "format_instructions"],
template='''GPT4 Correct System:Your goal is to extract structured information from the user's input that
matches the form described below. When extracting information please make
sure it matches the type information exactly. Do not add any attributes that
do not appear in the schema shown below.<|end_of_turn|>\n\n
GPT4 Correct User:
{type_description}\n\n
{format_instructions}<|end_of_turn|>\n\n
GPT4 Correct Assistant:''')
from kor.extraction import create_extraction_chain
from kor.nodes import Object, Text, Number
schema = Object(
id="person",
description="Personal information",
examples=[
("Alice and Bob are friends", [{"first_name": "Alice"}, {"first_name": "Bob"}])
],
attributes=[
Text(
id="first_name",
description="The first name of a person.",
)
],
many=True,
)
chain = create_extraction_chain(hf, schema, instruction_template=INSTRUCTION_TEMPLATE)
chain.run(("My name is Bobby. My brother's name Joe."))
When I insert these suffixes into the prompt and run the chain, the generation goes through, but an error comes out on output:
{'data': {},
'raw': "GPT4 Correct System:Your goal is to extract structured information from the user's input that\nmatches the form described below. When extracting information please make\nsure it matches the type information exactly. Do not add any attributes that\ndo not appear in the schema shown below.<|end_of_turn|>\n\n\nGPT4 Correct User:\n```TypeScript\n\nperson: Array<{ // Personal information\n first_name: string // The first name of a person.\n}>\n```\n\n\n\nPlease output the extracted information in CSV format in Excel dialect. Please use a | as the delimiter. \n Do NOT add any clarifying information. Output MUST follow the schema above. Do NOT add any additional columns that do not appear in the schema.<|end_of_turn|>\n\n\nGPT4 Correct Assistant:\n\nInput: Alice and Bob are friends\nOutput: first_name\nAlice\nBob\n\nInput: My name is Bobby. My brother's name Joe.\nOutput: first_name\nBobby\nJoe",
'errors': [kor.exceptions.ParseError(pandas.errors.ParserError('Error tokenizing data. C error: Expected 1 fields in line 4, saw 3\n'))],
'validated_data': {}}
I realize it is because of the suffixes, but how can I avoid it? what do I need to rewrite?
And if I don't specify the instruction_template parameter in the create_extraction_chain function, it can take 15-20 minutes to run the chain and give me complete nonsense.
Any help would be appreciated
updated
The text was updated successfully, but these errors were encountered:
dantepalacio
changed the title
KeyError: 'generated_text'
kor.exceptions.ParseError(pandas.errors.ParserError('Error tokenizing data. C error: Expected 1 fields in line 4, saw 3\n
Apr 18, 2024
Hi, I want to use kor with opensource openchat model(https://huggingface.co/openchat/openchat-3.5-0106). I know that this model has a certain suffix for prompt. Here is an example below:
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
My current code:
When I insert these suffixes into the prompt and run the chain, the generation goes through, but an error comes out on output:
I realize it is because of the suffixes, but how can I avoid it? what do I need to rewrite?
And if I don't specify the instruction_template parameter in the create_extraction_chain function, it can take 15-20 minutes to run the chain and give me complete nonsense.
Any help would be appreciated
updated
The text was updated successfully, but these errors were encountered: