-
Notifications
You must be signed in to change notification settings - Fork 21.7k
Insights: pytorch/pytorch
Overview
-
- 0 Merged pull requests
- 70 Open pull requests
- 19 Closed issues
- 69 New issues
Could not load contribution data
Please try again later
70 Pull requests opened by 46 people
-
Add pybind support for is_hpu property of Tensor object
#132228 opened
Jul 31, 2024 -
[CI][expriment] try linux.arm64.m7g.metal
#132232 opened
Jul 31, 2024 -
Optimize sort kernel for contiguous tensors
#132236 opened
Jul 31, 2024 -
make functorch CSE respect mutations as barriers (like fsdp.set_)
#132243 opened
Jul 31, 2024 -
[TD] Fuzz testing?
#132282 opened
Jul 31, 2024 -
[c10d][experimental] Add _abort_process_group
#132291 opened
Jul 31, 2024 -
Fix inf value reduction in non persistent reduction for scans
#132293 opened
Jul 31, 2024 -
Run cudagraphs on AOTAutograd cache hit
#132294 opened
Jul 31, 2024 -
[PT2][Optimus] Add missing example value for introduced nodes
#132297 opened
Jul 31, 2024 -
[ONNX] Define the `TORCH_ONNX_USE_EXPORTED_PROGRAM` flag
#132299 opened
Jul 31, 2024 -
[torchbind] don't warning for certain skippable methods.
#132306 opened
Jul 31, 2024 -
Split of "[reland] [export] fix zero arg export in training_ir and constant tensor handling"
#132307 opened
Jul 31, 2024 -
[BE][typing] fix types in common pruning
#132309 opened
Jul 31, 2024 -
[DeviceMesh] Add supports for non-continuous slicing
#132310 opened
Jul 31, 2024 -
[DeviceMesh] Update slicing documentation to include nD and non-continuous slicing
#132311 opened
Jul 31, 2024 -
[CUDA] `is_bf16_supported()` should not crash if there are no GPUs
#132313 opened
Jul 31, 2024 -
[export][pre-dispatch] add test case for user-defined triton kernels
#132317 opened
Jul 31, 2024 -
[dynamo] Treat attr of unspecialized buiitin nn modules as static
#132318 opened
Jul 31, 2024 -
[AOTI][refactor] Move set_cpp_kernel to base class
#132319 opened
Jul 31, 2024 -
[AOTI][refactor] Consolidate how python_kernel_name is set
#132320 opened
Jul 31, 2024 -
fsdp.set_: convey to functionalization that it mutates storage
#132322 opened
Jul 31, 2024 -
[AOTI][tooling][1/n] Add intermediate value debug printer
#132323 opened
Jul 31, 2024 -
[pyre][ptt] Prepare to enable type checking by default caffe2/[a-s]* directories
#132324 opened
Jul 31, 2024 -
[inductor] fix UndefinedTensorImpl singleton can't export on Windows.
#132326 opened
Jul 31, 2024 -
NOCOMMIT test Pyre CI for Pytorch
#132327 opened
Jul 31, 2024 -
Ban decorator usage of dynamo_timed
#132328 opened
Jul 31, 2024 -
[dynamo] fix add_push_null callsites with CALL_FUNCTION_EX
#132329 opened
Jul 31, 2024 -
[dynamo] support inspect.signature.bind
#132330 opened
Jul 31, 2024 -
[dynamo] Add line number to _warn_capture_scalar_outputs()
#132333 opened
Jul 31, 2024 -
[dynamo] Track params/buffers and mark them as static
#132334 opened
Jul 31, 2024 -
Add None return type to init
#132335 opened
Jul 31, 2024 -
Add Adafactor foreach impl
#132336 opened
Jul 31, 2024 -
move torch._functionalize APIs to pybind. add one for marking storage mutations
#132337 opened
Jul 31, 2024 -
[test] dynamo: dont hold onto real tensors in the __dict__ of any graph inputs
#132338 opened
Jul 31, 2024 -
[DeviceMesh] Remove parent mesh concept from _MeshEnv and replace by root mesh
#132339 opened
Jul 31, 2024 -
Only make wait_tensor as a side_effect op
#132340 opened
Jul 31, 2024 -
Only make wait_tensor as a side_effect op
#132341 opened
Jul 31, 2024 -
[CP] Extend CP to support load-blancing shards
#132342 opened
Jul 31, 2024 -
Fix file lock issue in AotCodeCompiler
#132343 opened
Jul 31, 2024 -
Cast inputs to low precision kernels in emulate low precision mode
#132345 opened
Jul 31, 2024 -
[AOTI] Fix complex not defined
#132347 opened
Jul 31, 2024 -
Set weights_only=False in export custom object load
#132348 opened
Jul 31, 2024 -
Fix all RuntimeErrors during weights_only load from being erroneously reported with the weights_only message
#132349 opened
Jul 31, 2024 -
Add None return type to init -- functorch and torchgen
#132351 opened
Jul 31, 2024 -
Add None return type to init -- tests
#132352 opened
Jul 31, 2024 -
[export] Apply CIA override only if implemented
#132353 opened
Jul 31, 2024 -
[Inductor] Small updates to B2B-GEMM
#132354 opened
Jul 31, 2024 -
[MPS] Add regression test for memory leak in `nn.Linear`
#132355 opened
Aug 1, 2024 -
[NJT][flop counter] attention: if offsets are fake, use max seqlen
#132356 opened
Aug 1, 2024 -
[pytorch][counters] Pybind for WaitCounter
#132357 opened
Aug 1, 2024 -
[WIP][partitioners] handle nodes with >1 output
#132359 opened
Aug 1, 2024 -
Remove some alias for c10::optional
#132361 opened
Aug 1, 2024 -
[Dynamo] Support abc.MutableMapping.get
#132363 opened
Aug 1, 2024 -
Revert "[aoti] Fix float16 and bfloat16 for generated GPU code (#131437)"
#132365 opened
Aug 1, 2024 -
[AOTI][refactor] Update MKLDNN ops cpp wrapper support
#132367 opened
Aug 1, 2024 -
[pipelining] Make test_schedule quiet
#132369 opened
Aug 1, 2024 -
[WIP] refactor distributed code
#132371 opened
Aug 1, 2024 -
Change deprecate warning on dispatch_on_subclass to warn once
#132374 opened
Aug 1, 2024 -
add testcase for kineto on-demand profiling, check #131020
#132375 opened
Aug 1, 2024 -
Add None return type to init -- tests rest
#132376 opened
Aug 1, 2024 -
[Easy] Fix argument name collision in `HigherOrderOperator` dispatched functions
#132377 opened
Aug 1, 2024 -
Move slow tests to be in repo
#132379 opened
Aug 1, 2024 -
Reduce number of guards introduced by check_cudnn_tensor_shapes when cudnn version is higher enough
#132384 opened
Aug 1, 2024 -
[inductor] cpp codegen alignas for all OSs.
#132387 opened
Aug 1, 2024 -
Apply loop split optimization in codegen_node
#132389 opened
Aug 1, 2024 -
Update torch-xpu-ops pin (ATen XPU implementation)
#132390 opened
Aug 1, 2024 -
[3/N][dtensor] Strided Sharding offset calculation util
#132391 opened
Aug 1, 2024 -
add src map to data-dependent errors
#132393 opened
Aug 1, 2024 -
[inductor] make restrict_keyword cross OSs.
#132394 opened
Aug 1, 2024 -
[11/N] Use std::nullopt and std::optional
#132396 opened
Aug 1, 2024
19 Issues closed by 13 people
-
DISABLED test_inplace_grad_div_floor_rounding_cuda_float64 (__main__.TestBwdGradientsCUDA)
#100520 closed
Aug 1, 2024 -
triton kernels (and maybe custom ops) that mutate multiple inputs silently incorrect with torch.compile
#132196 closed
Aug 1, 2024 -
torch.Tensor.uniform_ causes invalid syntax in InternalTorchDynamoError
#131019 closed
Jul 31, 2024 -
Document newness of `.inference_mode()` ?
#132288 closed
Jul 31, 2024 -
[AOTI] error: ‘bfloat16’ was not declared in this scope
#122986 closed
Jul 31, 2024 -
Unexpected Behavior with torch.mean and torch.sum on Empty Dimensions
#131457 closed
Jul 31, 2024 -
Even when force_disable_caches we still attempt to upload to remote cache
#132241 closed
Jul 31, 2024 -
xpu: intel conda channel is not available
#131802 closed
Jul 31, 2024 -
tolist() on single item
#132142 closed
Jul 31, 2024 -
Nightly windows wheels build failure Jul 31. 2024
#132295 closed
Jul 31, 2024 -
[CUDA][Inductor][Pooling] `test_comprehensive_nn_functional_max_pool2d_cuda` appears broken on H100
#132199 closed
Jul 31, 2024 -
Mutating global variable during inlining a function from imported module breaks in dynamo
#132165 closed
Jul 31, 2024 -
DISABLED [WORKFLOW_NAME] / [PLATFORM_NAME] / [JOB_NAME]
#132298 closed
Jul 31, 2024 -
`torch.export` Fails with `nn.MSELoss` Due to No Tensor Operations Found During Tracing
#132141 closed
Jul 31, 2024 -
Numpy Compatibility Issue with PyTorch DataLoaders (RuntimeError: Numpy is not available)
#132179 closed
Jul 31, 2024 -
DISABLED test_fully_shard_post_optim_event_overlap (__main__.TestFullyShardOverlap)
#131081 closed
Jul 31, 2024 -
AssertionError: Empty array is not supported in C
#131335 closed
Jul 31, 2024
69 Issues opened by 30 people
-
tensor.to(device) not copying data correctly between two GPUs
#132397 opened
Aug 1, 2024 -
`torch.multinomial` generates incorrect distribution
#132395 opened
Aug 1, 2024 -
torch cpu float16 range is not aligned with scipy on polygamma.
#132386 opened
Aug 1, 2024 -
crnn model from pth to torchscript problem
#132385 opened
Aug 1, 2024 -
dynamic tensor shape of vmap dimention
#132381 opened
Aug 1, 2024 -
[inductor][cpu] LayoutLMForSequenceClassification amp single thread accuracy failure
#132380 opened
Aug 1, 2024 -
per sample gradient with torch.autograd.grad
#132378 opened
Aug 1, 2024 -
[BUG] C++ error in `torch.compile` when `dynamic=True`
#132373 opened
Aug 1, 2024 -
Bizarre segmentation fault on MacOS M3 when sklearn is imported
#132372 opened
Aug 1, 2024 -
Switch to new name `oneDNN` instead of `MKLDNN`
#132368 opened
Aug 1, 2024 -
PyTorch's Distributed Checkpoint Cannot Save a Parameter of Size 1
#132366 opened
Aug 1, 2024 -
Dynamo bug when handling KeyError
#132362 opened
Aug 1, 2024 -
[AOTI] AOTI doesn't work well with torch.select
#132360 opened
Aug 1, 2024 -
public API for checking local tensor type
#132358 opened
Aug 1, 2024 -
[MPS] Memory leak in `nn.Linear`
#132332 opened
Jul 31, 2024 -
DISABLED test_profile_memory (__main__.CppThreadTest)
#132331 opened
Jul 31, 2024 -
Support Enum for dictionaries in TorchScript
#132315 opened
Jul 31, 2024 -
is_bf_16_supported() should return False when running on CPU **NOT AN ERROR**
#132303 opened
Jul 31, 2024 -
flash attention triton kernel x pt2 silently incorrect
#132301 opened
Jul 31, 2024 -
CUDA Memory Explosion -- Conv3D
#132300 opened
Jul 31, 2024 -
Can we avoid RecordStream for P2P ops?
#132292 opened
Jul 31, 2024 -
25+ compile tests were disabled as flaky
#132290 opened
Jul 31, 2024 -
test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_float32 and others leaking memory in inductor
#132287 opened
Jul 31, 2024 -
On RoCM, some torch.compile fails with 0 mismatched elements
#132283 opened
Jul 31, 2024 -
[inductor][cpu]pyhpc_isoneutral_mixing performance regression in 2024-07-30 nightly release
#132281 opened
Jul 31, 2024 -
DISABLED test_compile_forward_deg2rad_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132280 opened
Jul 31, 2024 -
DISABLED test_compile_backward_tan_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132279 opened
Jul 31, 2024 -
DISABLED test_compile_backward___radd___cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132278 opened
Jul 31, 2024 -
DISABLED test_compile_backward_div_floor_rounding_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132277 opened
Jul 31, 2024 -
DISABLED test_compile_backward_nan_to_num_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132276 opened
Jul 31, 2024 -
DISABLED test_compile_backward_bfloat16_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132275 opened
Jul 31, 2024 -
DISABLED test_compile_backward_half_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132274 opened
Jul 31, 2024 -
DISABLED test_compile_forward_asin_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132273 opened
Jul 31, 2024 -
DISABLED test_compile_forward_sgn_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132272 opened
Jul 31, 2024 -
DISABLED test_compile_backward_nn_functional_relu_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132271 opened
Jul 31, 2024 -
DISABLED test_compile_forward_nn_functional_relu_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132270 opened
Jul 31, 2024 -
DISABLED test_compile_backward_sinh_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132268 opened
Jul 31, 2024 -
DISABLED test_compile_backward_special_xlog1py_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132267 opened
Jul 31, 2024 -
DISABLED test_compile_forward_tan_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132265 opened
Jul 31, 2024 -
DISABLED test_compile_forward_angle_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132266 opened
Jul 31, 2024 -
DISABLED test_compile_backward_mul_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132264 opened
Jul 31, 2024 -
DISABLED test_compile_forward_conj_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132263 opened
Jul 31, 2024 -
DISABLED test_compile_forward_log10_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132262 opened
Jul 31, 2024 -
DISABLED test_compile_backward_log10_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132261 opened
Jul 31, 2024 -
DISABLED test_compile_backward_float_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132260 opened
Jul 31, 2024 -
DISABLED test_compile_forward_max_binary_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132259 opened
Jul 31, 2024 -
DISABLED test_compile_forward_fmax_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132258 opened
Jul 31, 2024 -
DISABLED test_compile_forward_min_binary_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132257 opened
Jul 31, 2024 -
DISABLED test_compile_backward_sigmoid_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132256 opened
Jul 31, 2024 -
DISABLED test_compile_backward_floor_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132255 opened
Jul 31, 2024 -
DISABLED test_compile_backward_ceil_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132254 opened
Jul 31, 2024 -
DISABLED test_compile_forward_ge_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132253 opened
Jul 31, 2024 -
DISABLED test_compile_backward_special_log_ndtr_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132252 opened
Jul 31, 2024 -
DISABLED test_compile_backward_positive_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132251 opened
Jul 31, 2024 -
DISABLED test_compile_backward_lgamma_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132250 opened
Jul 31, 2024 -
DISABLED test_compile_backward_cos_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132249 opened
Jul 31, 2024 -
DISABLED test_compile_backward_abs_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132248 opened
Jul 31, 2024 -
DISABLED test_compile_forward_abs_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132247 opened
Jul 31, 2024 -
DISABLED test_compile_backward_conj_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132246 opened
Jul 31, 2024 -
DISABLED test_compile_forward_float_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132245 opened
Jul 31, 2024 -
DISABLED test_compile_forward_positive_cuda_float32 (__main__.TestNestedTensorOpInfoCUDA)
#132244 opened
Jul 31, 2024 -
Inductor autotuner time_taken_ns is not accurate
#132242 opened
Jul 31, 2024 -
torch.nn.functional.interpolate completely broken with torch.jit.script and torch.fx
#132240 opened
Jul 31, 2024 -
hasattr tracing for PythonModuleVariable is unsupported #129742
#132237 opened
Jul 31, 2024 -
tensors, created with factory methods inside Subclasses.__torch_dispatch___ are not fakified
#132235 opened
Jul 31, 2024 -
torch.nn.Bilinear weights initialization reason
#132231 opened
Jul 31, 2024
239 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[RFC][c10d] Add a new API for timeout extend for one local rank and the timeout will reset when the first collective finishes
#130905 commented on
Aug 1, 2024 • 22 new comments -
Inductor-CPU WoQ int8 GEMM micro-kernel with scale epilogue
#131887 commented on
Aug 1, 2024 • 22 new comments -
[NJT] Support Chunk backward for simple cases
#132193 commented on
Aug 1, 2024 • 12 new comments -
[inductor][triton] improved kernel-level benchmarking
#130926 commented on
Aug 1, 2024 • 10 new comments -
AutoHeuristic: mixed_mm heuristic for A100
#131613 commented on
Aug 1, 2024 • 7 new comments -
[12/N] Fix clang-tidy warnings in jit
#132209 commented on
Jul 31, 2024 • 7 new comments -
Feature test ability in c10
#131248 commented on
Aug 1, 2024 • 4 new comments -
Construct NJT without graph breaks
#130292 commented on
Jul 31, 2024 • 4 new comments -
Add instruction count benchmark to run on pull requests
#131475 commented on
Aug 1, 2024 • 3 new comments -
typing ir.py - part 2
#131846 commented on
Aug 1, 2024 • 3 new comments -
[c10d][Log] Use pg_id instead of pg_name for logging prefix
#132058 commented on
Jul 31, 2024 • 3 new comments -
[traced-graph][sparse] propagate sparsity in fx graph
#131920 commented on
Aug 1, 2024 • 3 new comments -
[nccl] Wrap nccl code update with version check
#130419 commented on
Aug 1, 2024 • 2 new comments -
[pipelining] Add schedule runtime for lowered schedule
#130488 commented on
Aug 1, 2024 • 2 new comments -
SparseCsrCUDA: cuDSS backend for linalg.solve
#129856 commented on
Jul 31, 2024 • 2 new comments -
[PP] Forward only schedule
#132177 commented on
Aug 1, 2024 • 2 new comments -
Use FakeTensor cache for subclass inner tensors
#131803 commented on
Aug 1, 2024 • 2 new comments -
[Intel GPU] Allow XPU device in copy, cdist, index_put_impl
#130088 commented on
Jul 31, 2024 • 2 new comments -
[AOTI] Fix number type for AOTI
#132180 commented on
Aug 1, 2024 • 1 new comment -
[AOTI] Fix bfloat16 in CPU
#132150 commented on
Aug 1, 2024 • 1 new comment -
[ROCm] Add AMDSMI support for UUID input
#129741 commented on
Aug 1, 2024 • 1 new comment -
Add registration mechanism for aoti model runner
#131638 commented on
Jul 31, 2024 • 1 new comment -
Generalization of distributed UT content to enable non cuda device execution
#131758 commented on
Jul 31, 2024 • 1 new comment -
[test/torch_np] Fix usages of deprecated NumPy 2.0 APIs in numpy_tests
#131909 commented on
Jul 31, 2024 • 1 new comment -
[inductor] Replace torch.allclose with torch.testing.assert_close in test_fx_fusion
#130618 commented on
Aug 1, 2024 • 1 new comment -
[C10D] Clarify warning for concurrent PG usage
#131895 commented on
Jul 31, 2024 • 1 new comment -
[CUDA][SDPA] Fix expect export on sm90+
#132194 commented on
Aug 1, 2024 • 1 new comment -
[2/N] Fix clang-tidy warnings in aten/src/ATen/native/*.{cpp,h}
#131834 commented on
Aug 1, 2024 • 1 new comment -
adds kl for multinomial
#130899 commented on
Jul 31, 2024 • 1 new comment -
fix a typo in the householder_product docs
#124279 commented on
Jul 31, 2024 • 1 new comment -
Update SavedTensorHooks TLS stack to use SafePyObject
#131700 commented on
Jul 31, 2024 • 1 new comment -
WIP batch B,W into BW
#131762 commented on
Aug 1, 2024 • 0 new comments -
[Fix]: prim::If with multiple outputs and input return directly
#131779 commented on
Jul 31, 2024 • 0 new comments -
build head~1 and run pr-time benchmarks on it
#131751 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: Heuristic that ranks choices for mm
#131714 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: Enable explicit support for ranking
#131710 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: mixed_mm h100 heuristic
#131790 commented on
Aug 1, 2024 • 0 new comments -
Add get_optin_feature() to allow opt-in to amz2023
#131792 commented on
Jul 31, 2024 • 0 new comments -
fit(...) method of nn.Module class
#131806 commented on
Aug 1, 2024 • 0 new comments -
[inductor] benchmarking collective
#131819 commented on
Jul 31, 2024 • 0 new comments -
Skip frame if torch dispatch mode enabled
#131828 commented on
Aug 1, 2024 • 0 new comments -
Fix flaky test_non_contiguous_input_mm_plus_mm and test_precompilations
#131835 commented on
Aug 1, 2024 • 0 new comments -
Call _safe_softmax from sdpa math path
#131844 commented on
Aug 1, 2024 • 0 new comments -
Update fused flash cpu kernel to set masked out rows to 0
#131863 commented on
Aug 1, 2024 • 0 new comments -
Update fused mem-eff kernel to set masked out rows to 0
#131867 commented on
Aug 1, 2024 • 0 new comments -
Add a private _safe_softmax
#131060 commented on
Aug 1, 2024 • 0 new comments -
Implement masked_select op for NestedTensors
#131069 commented on
Jul 31, 2024 • 0 new comments -
[aoti] Remove mutable tensor assertion
#131074 commented on
Jul 31, 2024 • 0 new comments -
Add cmake option USE_SYSTEM_FBGEMM
#131282 commented on
Aug 1, 2024 • 0 new comments -
[inductor] support vec for atomic add
#131314 commented on
Aug 1, 2024 • 0 new comments -
[CP] Rewrite ring attention backward algorithm and enablement APIs
#131351 commented on
Aug 1, 2024 • 0 new comments -
[DONT REVIEW] Prototype DeviceMesh.create_view_dim()
#131352 commented on
Aug 1, 2024 • 0 new comments -
Fix constant propagation in builtins and UserClasses
#131354 commented on
Jul 31, 2024 • 0 new comments -
[FSDP][dtensor] add FSDP2+TP distributed state dict test
#131408 commented on
Aug 1, 2024 • 0 new comments -
[ts-migration][1/N]: Add prim::Loop for constant number of iterations and condition
#131418 commented on
Aug 1, 2024 • 0 new comments -
[executorch hash update] update the pinned executorch hash
#131420 commented on
Aug 1, 2024 • 0 new comments -
track number of cpp->python exceptions thrown in torch.compile benchmark suite
#131481 commented on
Jul 31, 2024 • 0 new comments -
[CUDA][CUTLASS] Fixes for CUTLASS upgrade
#131493 commented on
Aug 1, 2024 • 0 new comments -
[ts-migration]: Add support for aten::append
#131548 commented on
Jul 31, 2024 • 0 new comments -
Add explicit GQA support.
#131559 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: tuned_mm
#131615 commented on
Aug 1, 2024 • 0 new comments -
inductor mm autotuning: add back previously pruned configs
#131616 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: script to generate data for mm
#131617 commented on
Aug 1, 2024 • 0 new comments -
Correct sample creation of torch.histogram in UT op_db to align PyTorch defined operator semantics
#131630 commented on
Jul 31, 2024 • 0 new comments -
Add Sleef Implementation for maximum Kernel for ARM
#131642 commented on
Aug 1, 2024 • 0 new comments -
[pt2e][quant] Ensure BN node is erased after convert
#131651 commented on
Aug 1, 2024 • 0 new comments -
[do not land] Test warm start compile with changes
#131660 commented on
Jul 31, 2024 • 0 new comments -
AutoHeuristic: Support ranking/pruning choices
#131705 commented on
Aug 1, 2024 • 0 new comments -
Add device to create_cpu_state_dict api to make it backend agnostic
#131868 commented on
Jul 31, 2024 • 0 new comments -
[torch.special] Adding betainc with backward operation
#132135 commented on
Jul 31, 2024 • 0 new comments -
simplify return string
#132144 commented on
Jul 31, 2024 • 0 new comments -
Speedup int8mm_kernel with RVV
#132146 commented on
Jul 31, 2024 • 0 new comments -
Fix autotuning for flex_decoding
#132157 commented on
Aug 1, 2024 • 0 new comments -
[pytorch][counters] Pybind for WaitCounter
#132167 commented on
Jul 31, 2024 • 0 new comments -
disable the new cpp_builder args UT
#132168 commented on
Jul 31, 2024 • 0 new comments -
Update the FQN for auto_functionalized HOO.
#132171 commented on
Jul 31, 2024 • 0 new comments -
[export] change deepcopy to copy in _replace_set_grad_with_hop pass..
#132181 commented on
Aug 1, 2024 • 0 new comments -
Reland "[1/2] PT2 Inductor ComboKernels - Foreach cases (#124969)"
#132182 commented on
Jul 31, 2024 • 0 new comments -
[PT2] Port remove_noop to PT2 pre_grad passes
#132183 commented on
Aug 1, 2024 • 0 new comments -
[dtensor][debug] adding js script to pytorch github so that i can host the browser visualizer on pytorch
#132185 commented on
Jul 31, 2024 • 0 new comments -
[TS2E] Remove reference to torch.onnx internals
#132186 commented on
Aug 1, 2024 • 0 new comments -
C++ network flow implementation in c10
#132188 commented on
Aug 1, 2024 • 0 new comments -
Add optional exact_ordering parameter to the DataLoader
#132189 commented on
Aug 1, 2024 • 0 new comments -
[AOTI] Fix a typo in ExternKernel.codegen_const_args
#132191 commented on
Jul 31, 2024 • 0 new comments -
[fx] python_code(verbose=True): show size/strides for all tensors
#132192 commented on
Aug 1, 2024 • 0 new comments -
[TESTING] mark dynamo output_graph code log as "warning"
#132195 commented on
Aug 1, 2024 • 0 new comments -
[DTensor] Added naive replicate strategy for more diagonal ops
#132201 commented on
Jul 31, 2024 • 0 new comments -
Enable CUDA 12.4.1
#132202 commented on
Aug 1, 2024 • 0 new comments -
[WIP] Introduce a device-agnostic runtime API design
#132204 commented on
Aug 1, 2024 • 0 new comments -
use spawn as default start method to create dataloader subprocess
#132210 commented on
Jul 31, 2024 • 0 new comments -
[pytorch/mtia] MTIA equivalent of torch.<device>.get_device_properties
#132211 commented on
Jul 31, 2024 • 0 new comments -
introduce regression to be detected
#132215 commented on
Aug 1, 2024 • 0 new comments -
Populate submodules of `torch._C` to `sys.modules` recursively
#132216 commented on
Aug 1, 2024 • 0 new comments -
[Intel GPU] Remove special dispatch logic for xpu in adaptive_avg_pooling
#132217 commented on
Aug 1, 2024 • 0 new comments -
Workaround to SDPA bug in MacOS15
#132220 commented on
Jul 31, 2024 • 0 new comments -
[CI] Update CPU inductor smoke test model list and target
#132221 commented on
Aug 1, 2024 • 0 new comments -
[WIP] Enable Windows Arm64
#132225 commented on
Aug 1, 2024 • 0 new comments -
Upgrade submodule oneDNN to v3.5.1
#131877 commented on
Aug 1, 2024 • 0 new comments -
Add oneDNN BRGEMM support
#131878 commented on
Aug 1, 2024 • 0 new comments -
Use brgemm for Half flash attention kernel
#131879 commented on
Aug 1, 2024 • 0 new comments -
Add `padding_side` to `pad_sequence` with `"left"` and `"right"` options (`"right"` as default)
#131884 commented on
Aug 1, 2024 • 0 new comments -
[Inductor][CPP] Add vectorization support for double
#131886 commented on
Jul 31, 2024 • 0 new comments -
support Conv BN folding for inline inbuilt modules
#131888 commented on
Jul 31, 2024 • 0 new comments -
Enable FlashAttention on Windows
#131906 commented on
Jul 31, 2024 • 0 new comments -
[export] Support "custom" metadata field.
#131912 commented on
Jul 31, 2024 • 0 new comments -
Fix symbolic nested int printing
#131916 commented on
Jul 31, 2024 • 0 new comments -
Optimize test transformers
#131919 commented on
Jul 31, 2024 • 0 new comments -
Run performance test non-alternately
#131935 commented on
Jul 31, 2024 • 0 new comments -
Add compiler bisector
#131936 commented on
Aug 1, 2024 • 0 new comments -
Fix sum() forward for NJT
#131945 commented on
Jul 31, 2024 • 0 new comments -
retry distributed tests on new AMI
#131946 commented on
Jul 31, 2024 • 0 new comments -
wip
#131984 commented on
Aug 1, 2024 • 0 new comments -
Various fix39
#131989 commented on
Jul 31, 2024 • 0 new comments -
[BE] Make maybe_aliasing_or_mutating proper tag
#131990 commented on
Aug 1, 2024 • 0 new comments -
[mtia][sdpa] MTIA SDPA dispatch via _fused_sdp_choice_stub
#132008 commented on
Aug 1, 2024 • 0 new comments -
[Inductor][FlexAttention] Add kwarg to top level for users to specify kernel params
#132015 commented on
Aug 1, 2024 • 0 new comments -
Preserve source_fn_stack in the training IR decomp
#132033 commented on
Aug 1, 2024 • 0 new comments -
Add clone_* variants for wrapper functions for better cProfile
#132073 commented on
Jul 31, 2024 • 0 new comments -
fastpath FunctionalTensor sizes()
#132084 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: utils to make things easily reproducible
#132088 commented on
Aug 1, 2024 • 0 new comments -
[dynamo][dynamic] Treat buffers as static shape objects
#132098 commented on
Jul 31, 2024 • 0 new comments -
add script to compare results and fail at regression
#132099 commented on
Aug 1, 2024 • 0 new comments -
Add OpInfo for _convert_weight_to_int4pack
#132112 commented on
Jul 31, 2024 • 0 new comments -
[export] Fix serialization of OpOverload w/ SymInt outputs
#132126 commented on
Jul 31, 2024 • 0 new comments -
Add try except for _maybe_evaluate_static call in IndexPropagation
#132128 commented on
Aug 1, 2024 • 0 new comments -
torch.compile mode="max-autotune" precision appears to be lower
#96693 commented on
Jul 31, 2024 • 0 new comments -
Missing float8 storage
#131196 commented on
Jul 31, 2024 • 0 new comments -
torch.ops.fsdp.set_ on input doesn't actually modify the input (under torch.compile)
#132197 commented on
Jul 31, 2024 • 0 new comments -
backward of adaptive max pool (adaptive_max_pool2d_backward_cuda) doesn't have a deterministic implementation
#131972 commented on
Jul 31, 2024 • 0 new comments -
`torch.nn.functional.max_unpool2d`'s behavior is different on cpu and gpu on torch 2.5.0.dev20240708+cu121
#132041 commented on
Jul 31, 2024 • 0 new comments -
[RFC] PyTorch next wheel build platform: manylinux-2.28
#123649 commented on
Jul 31, 2024 • 0 new comments -
Static shape guards are put on tensors even if they never actually get used
#131893 commented on
Jul 31, 2024 • 0 new comments -
`all_reduce` hangs in minimal multi-node script
#131781 commented on
Jul 31, 2024 • 0 new comments -
[typing] Add missing `__all__` to modules.
#131765 commented on
Jul 31, 2024 • 0 new comments -
Can not torch.export.export with torch.arange
#131889 commented on
Jul 31, 2024 • 0 new comments -
Cpp-wrapper mode issue tracker
#117363 commented on
Jul 31, 2024 • 0 new comments -
[Inductor] Test failure in test_comprehensive_nn_functional_max_pool2d_cuda
#131072 commented on
Jul 31, 2024 • 0 new comments -
inspect.signature.bind is not supported
#93760 commented on
Aug 1, 2024 • 0 new comments -
Fails to compile with nvidia-cuda-toolkit-12.4.0
#122169 commented on
Aug 1, 2024 • 0 new comments -
[AudioLM] Graph break: 'skip function Random in file python3.10/random.py'
#121349 commented on
Aug 1, 2024 • 0 new comments -
[AudioLM] Graph break: 'inline in skipfiles: Random.randrange'
#121350 commented on
Aug 1, 2024 • 0 new comments -
torch.triu() may returns wrong values using MPS
#100005 commented on
Aug 1, 2024 • 0 new comments -
Certain .pyi files are not encoded as UTF-8 in Windows
#124897 commented on
Aug 1, 2024 • 0 new comments -
torch.nn.Transformer.generate_square_subsequent_mask gives nans instead of zeros when running on mps
#116170 commented on
Aug 1, 2024 • 0 new comments -
LibTorch cannot load PyTorch exported model
#47917 commented on
Aug 1, 2024 • 0 new comments -
[DEBUG] Strange behavior observed with PyTorch 2.4.0 + Windows + CPU inference
#131958 commented on
Aug 1, 2024 • 0 new comments -
Failed to export module with dynamic shapes
#131897 commented on
Aug 1, 2024 • 0 new comments -
[RFC] Use CUDA graphs by default on torch.compile
#121968 commented on
Aug 1, 2024 • 0 new comments -
aten::bucketize.Scalar doesn't work in compile mode
#132222 commented on
Aug 1, 2024 • 0 new comments -
torch.compiler.disable doesn't disable nested functions (also doesn't work as a context manager)
#123771 commented on
Aug 1, 2024 • 0 new comments -
[tracker] Open issue with inline_inbuilt_nn_modules
#131696 commented on
Aug 1, 2024 • 0 new comments -
LibTorch, Error in 'xxx': free(): invalid pointer
#30507 commented on
Aug 1, 2024 • 0 new comments -
RECORD_FUNCTION compilation error
#131339 commented on
Aug 1, 2024 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Aug 1, 2024 • 0 new comments -
performance drop because batching rule for aten::_scaled_dot_product_attention_math is not yet implemented
#110525 commented on
Jul 31, 2024 • 0 new comments -
"Adaptive pool MPS: input sizes must be divisible by output sizes", I keep getting this error even when I try to adjust for size
#97109 commented on
Jul 31, 2024 • 0 new comments -
torch._dynamo.exc.Unsupported: call_function args: UserDefinedObjectVariable(EasyDict)
#120219 commented on
Jul 31, 2024 • 0 new comments -
[DOC] fix `mkl-static` install channel issue.
#132103 commented on
Jul 31, 2024 • 0 new comments -
`torch.nn.transformer.forward` returns incorrect value inside `torch.no_grad()` blocks.
#132136 commented on
Jul 31, 2024 • 0 new comments -
Broken link in doc.
#132178 commented on
Jul 31, 2024 • 0 new comments -
Adding betainc with backward operation
#132133 commented on
Jul 31, 2024 • 0 new comments -
NotImplementedError in some APIs with Nested Jagged Tensors
#132212 commented on
Jul 31, 2024 • 0 new comments -
[profiler] CUDA runtime op has wrong device time because of lazy init
#132218 commented on
Jul 31, 2024 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
Jul 31, 2024 • 0 new comments -
AssertionError: call_args and arg_types do not match
#131337 commented on
Jul 31, 2024 • 0 new comments -
Drop Connect Layer
#131688 commented on
Jul 31, 2024 • 0 new comments -
torch.onnx.export 2GiB limit
#132205 commented on
Jul 31, 2024 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Jul 31, 2024 • 0 new comments -
Expand Examples for torch.autograd.functional.jacobian
#132140 commented on
Jul 31, 2024 • 0 new comments -
Inconsistency between `torch.get_device` and `torch.Tensor.get_device` with `__torch_function__`
#131944 commented on
Jul 31, 2024 • 0 new comments -
DISABLED test_comprehensive__unsafe_masked_index_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#131118 commented on
Jul 31, 2024 • 0 new comments -
Batching rule for `aten::_scaled_dot_product_efficient_attention`
#102457 commented on
Jul 31, 2024 • 0 new comments -
TorchDynamo ONNX Export does not work as expected with masking (ScatterElements)
#126856 commented on
Jul 31, 2024 • 0 new comments -
EventList still using old API breaks on Python 3.12
#132227 commented on
Jul 31, 2024 • 0 new comments -
torch.compile for torch.cumsum returns nans sometimes with all inf inputs
#132107 commented on
Jul 31, 2024 • 0 new comments -
Tests stopped working after merging #126905 on Arm64
#132132 commented on
Jul 31, 2024 • 0 new comments -
INTERNAL ASSERT FAILED at "../torch/csrc/autograd/python_torch_functions_manual.cpp":661 when returning a constant tensor in the forward method
#132134 commented on
Jul 31, 2024 • 0 new comments -
compile: `torch._subclasses.fake_tensor.FakeTensor` does not inherit from user-defined `torch.Tensor` subclass
#132148 commented on
Jul 31, 2024 • 0 new comments -
[dynamo] Context manager
#132154 commented on
Jul 31, 2024 • 0 new comments -
torch.ops.fsdp.set_ with torch.compile silently incorrect
#132200 commented on
Jul 31, 2024 • 0 new comments -
why not enable expandable_segments by default?
#130330 commented on
Jul 31, 2024 • 0 new comments -
TORCHELASTIC_RESTART_COUNT doesn't seem to be broadcasted to all worker
#108158 commented on
Jul 31, 2024 • 0 new comments -
FusedKernelCPU failed to delete generated dll files on Windows
#50260 commented on
Aug 1, 2024 • 0 new comments -
[ROCm] Tunableop record untuned
#128813 commented on
Aug 1, 2024 • 0 new comments -
Scale XBLOCK in triton reduction configs to avoid hitting max grid
#128826 commented on
Aug 1, 2024 • 0 new comments -
[reland][ROCm] TunableOp for gemm_and_bias
#128919 commented on
Aug 1, 2024 • 0 new comments -
Always use high precision for SDPA math backend
#128922 commented on
Aug 1, 2024 • 0 new comments -
Fix recent build error on ppc64le
#129736 commented on
Aug 1, 2024 • 0 new comments -
[BE][Easy][13/19] enforce style for empty lines in import segments in `test/j*/`
#129764 commented on
Jul 31, 2024 • 0 new comments -
[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]*/` and `torch/[e-n]*/`
#129769 commented on
Jul 31, 2024 • 0 new comments -
[BE][Easy][19/19] enforce style for empty lines in import segments in `torch/[o-z]*/`
#129771 commented on
Jul 31, 2024 • 0 new comments -
[Pipelining] Add schedule unshard/reshard pass
#129810 commented on
Aug 1, 2024 • 0 new comments -
Support built-in id function for TensorVariable on parameters
#130100 commented on
Aug 1, 2024 • 0 new comments -
Support XPU ABI=0 build
#130110 commented on
Aug 1, 2024 • 0 new comments -
Enable AOTI Eager to support int list
#130231 commented on
Jul 31, 2024 • 0 new comments -
[2/N][dtensor] Strided Sharding shard_to_replicate
#130239 commented on
Aug 1, 2024 • 0 new comments -
More appropriate socket errors and debug messages
#130347 commented on
Jul 31, 2024 • 0 new comments -
[pipelining] Add schedule send/recv pass
#130378 commented on
Aug 1, 2024 • 0 new comments -
AutoHeuristic: Introduce script to collect data for flex_attention
#130398 commented on
Aug 1, 2024 • 0 new comments -
flash_attn: limit compilation parallelism due to high memory requirements
#130443 commented on
Jul 31, 2024 • 0 new comments -
[ROCm][CK][Inductor] Enable addmm for CK backend to gemm max autotune
#130576 commented on
Jul 31, 2024 • 0 new comments -
[ROCm] ROCm triton pin update
#130625 commented on
Jul 31, 2024 • 0 new comments -
Gqa benchmark
#130634 commented on
Jul 31, 2024 • 0 new comments -
[MPS] Add support for MPS int4 (groupwise) and int8 (per channel) kernels starting with macOS 15.0
#130715 commented on
Aug 1, 2024 • 0 new comments -
[Inductor] support masked vectorization for the tail_loop of the 2d tiles kernel
#130724 commented on
Jul 31, 2024 • 0 new comments -
[FSDP][dtensor] use _StridedShard to represent nested sharding for correct full_tensor() result
#130760 commented on
Aug 1, 2024 • 0 new comments -
WIP implement batching of send/recv
#130860 commented on
Aug 1, 2024 • 0 new comments -
Support IPC for Expandable Segments
#130890 commented on
Aug 1, 2024 • 0 new comments -
Add decomposition for squeeze_copy
#130941 commented on
Jul 31, 2024 • 0 new comments -
Refactor process_inputs outside of create_aot_dispatcher_function
#130962 commented on
Jul 31, 2024 • 0 new comments -
[inductor] support vectorization for torch.any(bool) -> bool
#131017 commented on
Aug 1, 2024 • 0 new comments -
BuildExtension breaks build because of absolute path
#132130 commented on
Aug 1, 2024 • 0 new comments -
关于yolov5在mac设备上使用pytorch的mps加速出现的各种问题
#132226 commented on
Aug 1, 2024 • 0 new comments -
[RFC] Add new CPP builder for inductor on pytorch Windows
#124245 commented on
Aug 1, 2024 • 0 new comments -
Including AdaBound in the list of Optimizers.
#46809 commented on
Aug 1, 2024 • 0 new comments -
DISABLED test_comprehensive_nn_functional_nll_loss_cuda_float32 (__main__.TestDecompCUDA)
#117732 commented on
Aug 1, 2024 • 0 new comments -
OSError: [WinError 126] The specified module could not be found.
#131662 commented on
Aug 1, 2024 • 0 new comments -
xpu: efficientnet inference underperforms ipex
#132176 commented on
Aug 1, 2024 • 0 new comments -
M1 mps issue
#89708 commented on
Aug 1, 2024 • 0 new comments -
NCCL watchdog thread terminated with exception
#113128 commented on
Aug 1, 2024 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Jul 31, 2024 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Aug 1, 2024 • 0 new comments -
[draft] numpy 2.0.0rc1 test
#121979 commented on
Aug 1, 2024 • 0 new comments -
Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack
#125262 commented on
Jul 31, 2024 • 0 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
Aug 1, 2024 • 0 new comments -
refine fp32 precision api
#125888 commented on
Aug 1, 2024 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on
Aug 1, 2024 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on
Aug 1, 2024 • 0 new comments -
Introduce test skip markers for Sandcastle
#126273 commented on
Aug 1, 2024 • 0 new comments -
[inductor] Use full bit widths for bf16/fp16 vectorization
#126502 commented on
Jul 31, 2024 • 0 new comments -
[Inductor] support masked vectorization for the tail_loop
#126526 commented on
Jul 31, 2024 • 0 new comments -
[1/N][dtensor] introduce StridedShard placement type and _split_tensor() logic
#126697 commented on
Aug 1, 2024 • 0 new comments -
Default meta device to use swap_tensors in nn.Module._apply (.to_empty and .to('meta'))
#126819 commented on
Jul 31, 2024 • 0 new comments -
[2/N] Dynamic Shape: Enable dynamic shape support for aoti_eager
#126883 commented on
Jul 31, 2024 • 0 new comments -
[inductor] enable bf32 test for mkldnn conv
#127293 commented on
Aug 1, 2024 • 0 new comments -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on
Aug 1, 2024 • 0 new comments -
Updating Module Tracker
#127624 commented on
Jul 31, 2024 • 0 new comments -
[Testing only] Flip default on weights_only
#127627 commented on
Aug 1, 2024 • 0 new comments -
[MPS] Add native strided API for MPSNDArray starting with macOS 15
#128393 commented on
Aug 1, 2024 • 0 new comments