-
Notifications
You must be signed in to change notification settings - Fork 259
Insights: NVIDIA/TransformerEngine
Overview
Could not load contribution data
Please try again later
10 Pull requests merged by 7 people
-
Add cuDNN sliding window and set_deterministic_algorithm
#992 merged
Jul 10, 2024 -
Reduce CUDA driver calls when choosing transpose kernels
#1002 merged
Jul 10, 2024 -
[PyTorch] Prototype for operation-based API
#707 merged
Jul 9, 2024 -
[TE/JAX] Remove tuple wrapper of singleton in HLO lowering return
#1000 merged
Jul 9, 2024 -
Add test for building without support for any DL frameworks
#974 merged
Jul 9, 2024 -
Support individual framework builds for python<=3.7
#997 merged
Jul 8, 2024 -
Parallel build with limited resource
#987 merged
Jul 8, 2024 -
[PyTorch] Remove implicit padding and unpadding in
GroupedLinear
#984 merged
Jul 8, 2024 -
[MoE][Pytorch]Fix size mismatch error in fp8 transpose.
#988 merged
Jul 5, 2024
7 Pull requests opened by 6 people
-
Use 2hd layout for context parallelism
#993 opened
Jul 7, 2024 -
Add efficient cross entropy by cuda kernel.
#995 opened
Jul 8, 2024 -
Optimize multi-tensor cast-transpose kernel
#998 opened
Jul 8, 2024 -
Simplify logic for launching CI
#1001 opened
Jul 9, 2024 -
[JAX] Sharding Utils
#1003 opened
Jul 9, 2024 -
DGRAD_RS UB overlap Bug fixes
#1004 opened
Jul 10, 2024 -
[JAX] Allow enabling partial custom calls through the environment variable
#1007 opened
Jul 10, 2024
5 Issues closed by 5 people
-
Attention mask type must be padding or padding_causal for qkv_format=thd!
#1005 closed
Jul 11, 2024 -
Training core dump in megatron-lm with tp-comm-overlap.
#985 closed
Jul 8, 2024 -
ERROR: Failed building wheel for transformer-engine
#700 closed
Jul 5, 2024 -
Hang when training with MPI with --tp-comm-overlap turned on
#989 closed
Jul 5, 2024
4 Issues opened by 4 people
-
Command '['ninja', '-v', '-j', '1']' returned non-zero exit status 1.
#1008 opened
Jul 10, 2024 -
can fp8 be used with pipeline parallel?
#1006 opened
Jul 10, 2024 -
Why requires_grad attribute of weight from offloading will set to False ?
#996 opened
Jul 8, 2024 -
tp_overlap init failed when tp_size != world_size
#994 opened
Jul 8, 2024
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[TE/JAX] Prototype for New XLA Custom Calls with FFI
#946 commented on
Jul 9, 2024 • 8 new comments -
[MoE][Common/PyTorch] Add permutation
#936 commented on
Jul 10, 2024 • 6 new comments -
[PyTorch] Fixing hang in `initialize_ub()` for multi-node runs after PR901 removal of MPI-dependence
#986 commented on
Jul 11, 2024 • 2 new comments -
[C/PyTorch] Refactor and move userbuffers into TE/common
#760 commented on
Jul 11, 2024 • 1 new comment -
[Paddle] Add deterministic option in DotProductAttention
#956 commented on
Jul 11, 2024 • 1 new comment -
Calling backward(retain_graph=True) multiple times with TE Layer does not work
#990 commented on
Jul 5, 2024 • 0 new comments -
PyTorch 2.2.0 NVFuser deprecation is incompatible with TransformerEngine.
#666 commented on
Jul 11, 2024 • 0 new comments -
question for building wheel for transformer-engine
#516 commented on
Jul 11, 2024 • 0 new comments -
[PyTorch] How to restore fp8 amp training from checkpoint
#982 commented on
Jul 11, 2024 • 0 new comments -
[UB] Adding support for multinode nvlink
#815 commented on
Jul 5, 2024 • 0 new comments -
[Draft] Zero fwd and bwd results for THD+CP
#920 commented on
Jul 11, 2024 • 0 new comments -
[pre-commit.ci] pre-commit suggestions
#979 commented on
Jul 8, 2024 • 0 new comments