Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在object365数据集上训练处理text prompt 太长的问题 #61

Open
LuciferZap opened this issue Mar 9, 2024 · 2 comments
Open

Comments

@LuciferZap
Copy link

你好,如果我需要在object365数据集上训练,我应该如何处理text prompt过长的问题。
如果我直接将object365数据集转成一个jsonl文件,可能labelmap会太长导致意想不到的bug,我看了原作者github里面的issues,他提到可以将数据集切分,以下是我做的数据格式。对应数据训练脚本里的dataset.json文件,请问这是正确的切分方式吗?
( 我将object365数据集按照类别分成5个subset,每一个subset包含73类,每个subset有自己独立的jsonl文件记录了对应的图片和box信息,每个subset的labelmap都是不一样的,且labelmap之间不存在交集,每个labelmap文件的index都是从0开始。)
{
"train": [
{
"root": "path/object365/",
"anno": "path/obj365_train_split1.jsonl",
"label_map": "obj365_split1_labelmap.json",
"dataset_mode": "odvg"
},
{
"root": "path/object365/",
"anno": "path/obj365_train_split2.jsonl",
"label_map": "obj365_split2_labelmap.json",
"dataset_mode": "odvg"
},
{
"root": "path/object365/",
"anno": "path/obj365_train_split3.jsonl",
"label_map": "obj365_split3_labelmap.json",
"dataset_mode": "odvg"
},
{
"root": "path/object365/",
"anno": "path/obj365_train_split4.jsonl",
"label_map": "obj365_split4_labelmap.json",
"dataset_mode": "odvg"
},
{
"root": "path/object365/",
"anno": "path/obj365_train_split5.jsonl",
"label_map": "obj365_split5_labelmap.json",
"dataset_mode": "odvg"
},
],
"val": [
{
"root": "path/object365/",
"anno": "path/obj365_val_split1.jsonl",
"label_map": null,
"dataset_mode": "coco"
}
]
}

@funny000
Copy link

@LuciferZap 楼主你这样处理完训练效果怎么样?

@LuciferZap
Copy link
Author

O365的类别不会超出token限制,如果强行拆分没有什么好处。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants