-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add dgl.nn.CuGraphSAGEConv
model
#5137
Conversation
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
dgl.nn.CuGraphSAGEConv
modeldgl.nn.CuGraphSAGEConv
model
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
|
||
def reset_parameters(self): | ||
r"""Reinitialize learnable parameters.""" | ||
self.linear.reset_parameters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously SageConv
considers Xavier uniform while nn.Linear.reset_parameters
considers Kaiming uniform. I'm not sure about the effects of this difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kaiming is more suitable here as ReLU is often the choice for the nonlinearity in GNN; Xavier was designed for sigmoid function.
r"""Reinitialize learnable parameters.""" | ||
self.linear.reset_parameters() | ||
|
||
def forward(self, g, feat, max_in_degree=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another difference, lack of support for edge_weight
@@ -0,0 +1,200 @@ | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run this script? If so, what performance number did you obtain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in terms of pure training time (not including dataloading), SAGEConv
takes 2.5s per epoch, while CuGraphSAGEConv
takes 2.0s, despite the overhead of coo-to-csc conversion. Test accuracy is also the same.
Edit: add timings for both mode in the example
mode | mixed (uva) | pure gpu |
---|---|---|
CuGraphSAGEConv | 2.0 s | 1.2 s |
SAGEConv | 2.5 s | 1.7 s |
def forward(self, blocks, x): | ||
h = x | ||
for l, (layer, block) in enumerate(zip(self.layers, blocks)): | ||
h = layer(block, h, max_in_degree=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit ugly. Perhaps it's better to pass the argument to SAGE.__init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what needs to be done here? Are you suggesting to unpack to loop like this?
h = F.relu(self.conv1(g[0], x))
h = F.relu(self.conv2(g[1], h))
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the specification of max_in_degree
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see and I do agree that it is not an ideal interface. We did not make max_in_degree
an attribute of CuGraphSAGEConv
since it is a property of the graph (i.e., block
), rather than the model. I have removed it from the example as this flag is optional.
In the meantime, we are improving our aggregation primitives to be more flexible to eventually ditch this option.
default="mixed", | ||
choices=["cpu", "mixed", "puregpu"], | ||
help="Training mode. 'cpu' for CPU training, 'mixed' for CPU-GPU mixed training, " | ||
"'puregpu' for pure-GPU training.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is automatically formatted by lintrunner. I removed the cpu mode, as it is not supported by the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done a pass
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Thank you @mufeili for the review. Here is a list of disparities between
Some preliminary performance numbers using the included example:
(copied over from the review comment above for better visibility) |
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
* add CuGraphSAGEConv model * fix lint issues * update model to reflect changes in make_mfg_csr(), move max_in_degree to forward() * lintrunner * allow reset_parameters() * remove norm option, simplify test * allow full graph fallback option, add example * address comments * address reviews --------- Co-authored-by: Mufei Li <[email protected]>
* add CuGraphSAGEConv model * fix lint issues * update model to reflect changes in make_mfg_csr(), move max_in_degree to forward() * lintrunner * allow reset_parameters() * remove norm option, simplify test * allow full graph fallback option, add example * address comments * address reviews --------- Co-authored-by: Mufei Li <[email protected]>
Description
This PR adds a GraphSAGE model Add
dgl.nn.CuGraphSAGEConv
that uses the accelerated sparse aggregation primitives in cugraph-ops. It requirespylibcugraphops >= 23.02
.Checklist
Please feel free to remove inapplicable items for your PR.
Changes
dgl.nn.CuGraphSAGEConv
SAGEConv
Notes
Fixes rapidsai/cugraph-ops#177.