dynamo eval the subfunction of a skiped frame with callback, bad performance and more error #128928

Ma-Jian1 · 2024-06-18T04:52:57Z

🐛 Describe the bug

torch.compile try to split and compile in first not support code, concate the compiled and non-compiled code, and then run them all under callback, which will try to recompile the not supported function (there maybe other supported code under the not supported code), but it does not ignore the not supported code, which is bad for performance, and throw more error.

Error logs

No response

Minified repro

 import torch

 m = torch.nn.SiLU()
 class A:
     def myfunc():
         pass

 def break_graph3(t):
     funcs = [A.myfunc, m]
     iter = ("SiLU" in f.__class__.__name__ for f in funcs) # "in" not  full supported
     a = all(iter) # generator not full supported

 def toy_example(t):
     t = t + 1
     break_graph3(t)
     t = t + 3
     return t


 fn = torch.compile(toy_example)
 print(fn(torch.randn(1)))

Versions

Collecting environment information...
PyTorch version: 2.2.2a0+gitb11808e
Is debug build: True
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.5 (ssh:https://gerrit.habana-labs.com:29418/tpc_llvm10 40a69d3611a3941b828718e8d803ea1cfb724976)
CMake version: version 3.28.1
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
CPU family: 6
Model: 85
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
Stepping: 0
BogoMIPS: 5187.81
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xsaves arat pku ospke md_clear flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 384 KiB (12 instances)
L1i cache: 384 KiB (12 instances)
L2 cache: 12 MiB (12 instances)
L3 cache: 38.5 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Syscall hardening, KVM SW loop
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] habana-torch-dataloader==1.17.0+git84e273963
[pip3] habana-torch-plugin==1.17.0+git84e273963
[pip3] intel-extension-for-pytorch==2.2.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==2.2.0.post0
[pip3] torch==2.2.2a0+gitb11808e
[pip3] torch-debug==2.2.0a0+git10d65c0
[pip3] torch_tb_profiler==0.4.0
[pip3] torchaudio==2.2.0+08901ad
[pip3] torchdata==0.7.1+5e6f7b7
[pip3] torchmetrics==1.3.2
[pip3] torchtext==0.17.0+400da5c
[pip3] torchvision==0.17.0+b2383d4
[pip3] triton==2.2.0
[conda] Could not collect

cc @ezyang @anijain2305 @chauhang

masnesral · 2024-06-19T15:43:58Z

@Ma-Jian1, sorry what is the problem exactly? Is it just that dynamo does not currently support this generator? If so, then I believe this is a known shortcoming. I found this issue with more details: #93737

Ma-Jian1 · 2024-06-20T07:39:17Z

not that all.
I have correct the example.
in general, dynamo found it does not support this generator, then it split the graph, compile the graph before the split point.
it's all ok by now.
then it run all code with callback set, and it will try to recompile the not supported code, e.g. the generator here.
maybe it should run the generator without callback?

Ma-Jian1 · 2024-06-20T07:45:35Z

I'm not sure if the generator is coming from cg.make_function_with_closure, or is it the original gen ?

masnesral · 2024-06-20T18:02:31Z

@Ma-Jian1, sorry there's still something wrong with the example: name 'A' is not defined

Ma-Jian1 · 2024-06-21T01:22:17Z

@masnesral sorry for that, updated.

Ma-Jian1 · 2024-06-21T06:16:13Z

seems I have understood the whole story， but it is still related to the generator.
in the first pass, it just "creates" the generator and meets the operator "in".
in the after pass, it calls the "generator", throws "unimplement(generator)".
maybe it has little impact on performance, so it does not wrap the "generator" call specifically.

Ma-Jian1 added the oncall: pt2 label Jun 18, 2024

Ma-Jian1 closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamo eval the subfunction of a skiped frame with callback, bad performance and more error #128928

dynamo eval the subfunction of a skiped frame with callback, bad performance and more error #128928

Ma-Jian1 commented Jun 18, 2024 •

edited

Loading

masnesral commented Jun 19, 2024

Ma-Jian1 commented Jun 20, 2024

Ma-Jian1 commented Jun 20, 2024 •

edited

Loading

masnesral commented Jun 20, 2024

Ma-Jian1 commented Jun 21, 2024

Ma-Jian1 commented Jun 21, 2024

dynamo eval the subfunction of a skiped frame with callback, bad performance and more error #128928

dynamo eval the subfunction of a skiped frame with callback, bad performance and more error #128928

Comments

Ma-Jian1 commented Jun 18, 2024 • edited Loading

🐛 Describe the bug

Error logs

Minified repro

Versions

masnesral commented Jun 19, 2024

Ma-Jian1 commented Jun 20, 2024

Ma-Jian1 commented Jun 20, 2024 • edited Loading

masnesral commented Jun 20, 2024

Ma-Jian1 commented Jun 21, 2024

Ma-Jian1 commented Jun 21, 2024

Ma-Jian1 commented Jun 18, 2024 •

edited

Loading

Ma-Jian1 commented Jun 20, 2024 •

edited

Loading