Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added collective communication tutorial #48

Merged
merged 7 commits into from
Oct 31, 2021
Merged

Conversation

Priyansi
Copy link
Collaborator

Fixes #16

@Priyansi
Copy link
Collaborator Author

Hey @sdesrozis , thank you for the comment! I have updated the notebook.

@sdesrozis
Copy link
Contributor

Does it make sense to add basic examples to illustrate ? (below, copy/paste from PyTorch doc)

>>> # All tensors below are of torch.int64 type.
>>> # We have 2 process groups, 2 ranks.
>>> tensor = torch.arange(2, dtype=torch.int64) + 1 + 2 * rank
>>> tensor
tensor([1, 2]) # Rank 0
tensor([3, 4]) # Rank 1
>>> idist.all_reduce(tensor, op="SUM")
>>> tensor
tensor([4, 6]) # Rank 0
tensor([4, 6]) # Rank 1

As far I know, there is no example in our doc for all_reduce, barrier and all_gather, although an example is provided for broadcast.

What do you think ? Maybe a good answer is no, and add a few examples in the docstring.

@Priyansi
Copy link
Collaborator Author

@sdesrozis I have no issue adding these to the tutorial. My initial discussion with @vfdev-5 on this was that we collect all the examples for these functions currently provided in our docs and provide motivation for using them. I can add such basic examples in both the docstrings and here, if needed. What do you think?

@sdesrozis
Copy link
Contributor

IMO it would be great to have some basic examples in both docstring and here.

Anyway, the notebook is great !

@vfdev-5
Copy link
Member

vfdev-5 commented Oct 26, 2021

@Priyansi the cells are not executable right as

Let's assume we have 3 GPUs with ranks 0, 1 and 2 and have a tensor on all of them.

?

Maybe, we can use idist.spawn to create 4 cpu processes and thus execute the code ?
Otherwise, notebook looks very nice !

@Priyansi
Copy link
Collaborator Author

Hey @vfdev-5 , I made the basic examples executable, could you please re-review them?

@vfdev-5
Copy link
Member

vfdev-5 commented Oct 26, 2021

@Priyansi i think using https://pytorch.org/ignite/v0.4.7/distributed.html#ignite.distributed.utils.spawn is more simplier than idist Parallel in this case. What do you think ?

@Priyansi
Copy link
Collaborator Author

@Priyansi i think using https://pytorch.org/ignite/v0.4.7/distributed.html#ignite.distributed.utils.spawn is more simplier than idist Parallel in this case. What do you think ?

Yeah this seems simpler, I'll make the change.

@Priyansi
Copy link
Collaborator Author

Hey @vfdev-5 , I replaced, Parallel with spawn. Also, I noticed that the args parameter is required for spawn and I wondered why as I didn't have the need to pass any args for these examples.

@vfdev-5
Copy link
Member

vfdev-5 commented Oct 31, 2021

Also, I noticed that the args parameter is required for spawn and I wondered why as I didn't have the need to pass any args for these examples.

basically, args can be global configuration that user would like to pass into spawn function. In your case, functions are very trivial and do not represent real use-cases...

Copy link
Member

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Priyansi , lgtm

@Priyansi
Copy link
Collaborator Author

basically, args can be global configuration that user would like to pass into spawn function. In your case, functions are very trivial and do not represent real use-cases...

No, I was just wondering if it's necessary to make it required. Anyway, I got it!

@Priyansi Priyansi merged commit cf5cbd1 into main Oct 31, 2021
@Priyansi Priyansi deleted the add-collective-comm-nb branch October 31, 2021 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

idist tutorial
3 participants