Skip to content

Tags: bytedance/byteps

Tags

v0.2.5.post20

Toggle v0.2.5.post20's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
launcher: join workers as they exit (#429)

check worker exit status in the order they exit. This way failed workers
can be discovered early, and the entire job terminated as soon as
possible.

Signed-off-by: yulu.jia <[email protected]>

v0.2.5.post19

Toggle v0.2.5.post19's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ps-lite: disable ucx error handling by default (#424)

disable ucx signal handlers so that some faulty user code can still run
even if some child process of the program encounters a segfault.

Signed-off-by: Yulu Jia <[email protected]>

v0.2.5.post18

Toggle v0.2.5.post18's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ps-lite: update ps-lite (#423)

update ps-lite to the latest commit

Signed-off-by: Yulu Jia <[email protected]>

v0.2.5.post17

Toggle v0.2.5.post17's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
tensorflow: fix bug in broadcast_variables (#416)

When there's only one rank in total, broadcast_variables should still
return a tf operation.

Signed-off-by: Yulu Jia <[email protected]>

v0.2.5.post16

Toggle v0.2.5.post16's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
server: improve thread safety (#412)

protect update_buf_ with a lock.

v0.2.5.post15

Toggle v0.2.5.post15's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
update doc for core affinity envs (#407)

change semicolon-separated to colon-separated

Signed-off-by: Yulu Jia <[email protected]>

v0.2.5.post14

Toggle v0.2.5.post14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix bool env, disable avx512 (#399)

- fix bool env parsing in server.cc
- disable avx512 when compiling. enabling avx512 may cause tensorflow
extension build failure. avx512 support in Eigen is likely not stable
yet.

Signed-off-by: yulu.jia <[email protected]>

v0.2.5.post12

Toggle v0.2.5.post12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
tf: skip bcast if there's only one worker (#385)

Skip broadcasting variables if there's only one worker

Signed-off-by: Yulu Jia <[email protected]>

v0.2.5.post7

Toggle v0.2.5.post7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
torch: fix hang after int tensor push_pull (#358)

mark task done after averaging an int tensor. This fixes a bug
introduced in 46944e8.

Signed-off-by: Yulu Jia <[email protected]>

v0.2.5.post6

Toggle v0.2.5.post6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
build: skip installing disabled extensions (#354)

If an extension is explictly disabled by:

  export BYTEPS_WITHOUT_MXNET=1
  export BYTEPS_WITHOUT_PYTORCH=1
  export BYTEPS_WITHOUT_TENSORFLOW=1

do not try to install it.

Signed-off-by: Yulu Jia <[email protected]>