Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 为什么add_vertices中的vid_field无法起作用,add_edges中的src_field和dst_field也是无法起效。 #3704

Open
yimijiu123 opened this issue Apr 11, 2024 · 1 comment
Assignees

Comments

@yimijiu123
Copy link

yimijiu123 commented Apr 11, 2024

Describe the bug
对于vid_field我的理解是,可以指定导入数据中的哪一列为点的ID列,用于后续载入边的对应索引属性。
在实际中,即使指定了vid_field的列,但是还是默认点数据集的第一列为ID列。比如:
`

sess = get_default_session()
#string点
graph = sess.g(oid_type="string")
id = np.array(['1', '2', '3', '4'])
idd = np.array(["a", "b", "c", "d"])
avg_score = np.array([11, 22, 23, 9])
v_data = np.transpose(np.vstack([idd,id, avg_score]))
df_student = pd.DataFrame(v_data, columns=["idd","id", "avg_score"])
src_id = np.array(['1', '2', '3', '1'])
dst_id = np.array(['2', '4', '2', '4'])
group_size = np.array([4,1,2,3])
e_data = np.transpose(np.vstack([src_id, dst_id, group_size]))
df_group = pd.DataFrame(e_data, columns=["src_id", "dst_id", "group_size"]).astype({"group_size": int})
graph = graph.add_vertices(df_student,label="student",vid_field="id")
graph=graph.add_edges(df_group,label="guide",src_label="student",dst_label="student")
pg = graph.project(vertices={"student": ["id"]}, edges={"guide": ["group_size"]})
`

输出结果为:
1712828891851

问题:

  1. 可以看出vid_field="id"没有起效,也试过vid_field="1"、vid_field=id、vid_field=1,都不对。类似的,设置src_field=1,dst_field=0也不起效,只能交换边数据集的第一二列才可以实现交换出发点和目标点。
  2. 不起效的原因是否是因为设置了:
    `

static constexpr int id_column = 0;
static constexpr int src_column = 0;
static constexpr int dst_column = 1;
static constexpr int edge_id_column = 2;
`

  1. types_pb2.SRC_VID、types_pb2.SRC_LABEL、types_pb2.V_LABEL_ID、types_pb2.VID、types_pb2.LABEL与C++中的是否对应。比如:GetGid(fid_t fid, label_id_t label_id, oid_t oid, vid_t& gid)对应的是什么?为什么chunk.attr[types_pb2.VID].CopyFrom(utils.s_to_attr(str(self.vid_field)))?
@siyuan0322
Copy link
Collaborator

Thanks for reporting. It is a bug indeed. There was some logic to rearrange the column of the dataframe accordingly, but that piece of code maybe lost during massive refactor of loader 😢

These types_pb2.* is for carrying those meta information from python to C++, which is the CopyFrom statement is used for.

the GetGid is a method, the label_id, oid, gid is replaced by actually value of the vertex, the meaning is not related to the notions above.

@siyuan0322 siyuan0322 self-assigned this Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants