-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attribute description added to extracted info in Chinese #152
Comments
@ihorizons2022 Thanks for filing the issue going to take a look how to handle! |
Fix merged into main |
@ihorizons2022 I changed the default behavior of the JSON encoder to avoid encoding in ASCII. So original characters should be preserved as are for the LLM to see. Could you let me know if this helps and/or if you see any other issues associated with extraction when working with text in Chinese? It should be sufficient for you to bump the library version to 0.9.2. |
thank you for the quick response. |
code snippet:
schema = Object(
id="post",
description=(
'''
社交媒体博主在社交媒体上发布的脚本
'''
),
attributes=[
Text(
id="ingredient",
description="化妆品的原料和成分",
examples=[],
many=True,
),
Text(
id="function",
description="产品能够起到的作用",
examples=[
],
many=True,
),
Text(
id="brand",
description="文案中的化妆品品牌",
examples=[],
many=True,
),
Text(
id="product",
description="宣传的化妆品产品",
examples=[],
many=True,
),
Text(
id="skin",
description="皮肤的类型和状态",
examples=[
],
many=True,
),
Text(
id="target",
description="品牌或者产品适用的用户人群",
examples=[],
many=True,
),
Text(
id="feeling",
description="使用化妆品后的个人感受",
examples=[],
many=True,
),
Text(
id="scene",
description="适合使用化妆品的地点,气候,节日,季节,场合等",
examples=[],
many=True,
),
Text(
id="promotion",
description="产品促销信息",
examples=[],
many=True,
),
Text(
id="special",
description="产品的优势和特点",
examples=[
],
many=True,
),
Text(
id="category",
description="化妆品所属的品类",
examples=[
("第二 有一支好的防晒霜", '防晒霜')
],
many=True,
)
],
many=False
)
but the output likes:
{'post': {'brand': ['ZOTO'],
'product': ['防晒霜'],
'function': ['防晒'],
'skin': ['皮肤类型和状态'],
'target': ['用户人群'],
'feeling': ['使用化妆品后的个人感受'],
'scene': ['适合使用化妆品的地点,气候,节日,季节,场合等'],
'promotion': ['产品促销信息'],
'category': ['防晒霜']}}
actually, '皮肤类型和状态' is attribute description not extracted info
The text was updated successfully, but these errors were encountered: