[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release #128933

zxd1997066 · 2024-06-18T06:03:53Z

🐛 Describe the bug

======================= export model ===============================
W0617 17:40:01.166444 140156037509568 torch/_export/__init__.py:95] +============================+
W0617 17:40:01.166611 140156037509568 torch/_export/__init__.py:96] |     !!!   WARNING   !!!    |
W0617 17:40:01.166671 140156037509568 torch/_export/__init__.py:97] +============================+
W0617 17:40:01.166718 140156037509568 torch/_export/__init__.py:98] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward.
W0617 17:40:01.166769 140156037509568 torch/_export/__init__.py:99] Please switch to use torch.export instead.
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0] Error while creating guard:
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0] Name: ''
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     Source: shape_env
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     Create Function: SHAPE_ENV
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     Guard Types: None
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     Code List: None
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     Object Weakref: None
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     Guarded Class Weakref: None
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0] Traceback (most recent call last):
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]   File "/workspace/pytorch/torch/_guards.py", line 259, in create
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     return self.create_fn(builder, self)
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]   File "/workspace/pytorch/torch/_dynamo/guards.py", line 1728, in SHAPE_ENV
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     guards = output_graph.shape_env.produce_guards(
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]   File "/workspace/pytorch/torch/fx/experimental/symbolic_shapes.py", line 4167, in produce_guards
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]     raise ConstraintViolationError(
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0] torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (dim0)! For more information, run with TORCH_LOGS="+dynamic".
E0617 17:40:03.274544 140156037509568 torch/_guards.py:261] [0/0]   - Not all values of dim0 = L['input_ids'].size()[0] in the specified range satisfy the generated guard Ne(L['input_ids'].size()[0], 9223372036854775807).
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0] Created at:
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]   File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 564, in transform
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]     tracer = InstructionTranslator(
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]   File "/workspace/pytorch/torch/_dynamo/symbolic_convert.py", line 2371, in __init__
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]     output=OutputGraph(
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]   File "/workspace/pytorch/torch/_dynamo/output_graph.py", line 313, in __init__
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]     self.init_ambient_guards()
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]   File "/workspace/pytorch/torch/_dynamo/output_graph.py", line 452, in init_ambient_guards
E0617 17:40:03.275653 140156037509568 torch/_guards.py:263] [0/0]     self.guards.add(ShapeEnvSource().make_guard(GuardBuilder.SHAPE_ENV))
Traceback (most recent call last):
  File "./transformers/examples/pytorch/text-classification/run_glue.py", line 652, in <module>
    main()
  File "./transformers/examples/pytorch/text-classification/run_glue.py", line 590, in main
    metrics = trainer.evaluate(eval_dataset=eval_dataset)
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 3109, in evaluate
    start_time, output = eval_loop(
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 3232, in evaluation_loop
    else self.accelerator.prepare_model(model, evaluation_mode=True)
  File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1449, in prepare_model
    exported_model = capture_pre_autograd_graph(
  File "/workspace/pytorch/torch/_export/__init__.py", line 170, in capture_pre_autograd_graph
    m = torch._dynamo.export(
  File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 1425, in inner
    raise constraint_violation_error
  File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 1379, in inner
    result_traced = opt_f(*args, **kwargs)
  File "/workspace/pytorch/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/pytorch/torch/nn/modules/module.py", line 1575, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 433, in _fn
    return fn(*args, **kwargs)
  File "/workspace/pytorch/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/pytorch/torch/nn/modules/module.py", line 1575, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 1116, in __call__
    return self._torchdynamo_orig_callable(
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 472, in __call__
    return _compile(
  File "/workspace/pytorch/torch/_utils_internal.py", line 84, in wrapper_function
    return StrobelightCompileTimeProfiler.profile_compile_time(
  File "/workspace/pytorch/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 817, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/workspace/pytorch/torch/_dynamo/utils.py", line 231, in time_wrapper
    r = func(*args, **kwargs)
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 726, in compile_inner
    check_fn = CheckFunctionManager(
  File "/workspace/pytorch/torch/_dynamo/guards.py", line 2141, in __init__
    guard.create(builder)
  File "/workspace/pytorch/torch/_guards.py", line 259, in create
    return self.create_fn(builder, self)
  File "/workspace/pytorch/torch/_dynamo/guards.py", line 1728, in SHAPE_ENV
    guards = output_graph.shape_env.produce_guards(
  File "/workspace/pytorch/torch/fx/experimental/symbolic_shapes.py", line 4167, in produce_guards
    raise ConstraintViolationError(
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (dim0)! For more information, run with TORCH_LOGS="+dynamic".
  - Not all values of dim0 = L['input_ids'].size()[0] in the specified range satisfy the generated guard Ne(L['input_ids'].size()[0], 9223372036854775807).

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/workspace/pytorch/numa_launcher.py", line 805, in <module>
    main()
  File "/workspace/pytorch/numa_launcher.py", line 800, in main
    launcher.launch(args)
  File "/workspace/pytorch/numa_launcher.py", line 481, in launch
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd_s)
subprocess.CalledProcessError: Command 'numactl -C 0-31 -m 0 /opt/conda/bin/python -u ./transformers/examples/pytorch/text-classification/run_glue.py --model_name_or_path albert-base-v1 --task_name MRPC --do_eval --max_seq_length 16 --learning_rate 2e-5 --overwrite_output_dir --output_dir /tmp/tmp_huggingface/ --torch_compile --torch_compile_quant ptq_dynamic --report_to=none --per_device_eval_batch_size 64' returned non-zero exit status 1.

Versions

SW info

SW	Branch	Target commit	Refer commit
Pytorch	nightly	8410bf5	`963d450`
Torchbench	chuanqiw/inductor_quant	ee35d764	ee35d764
torchaudio	nightly	b829e93	1980f8a
torchtext	nightly	b0ebddc	b0ebddc
torchvision	nightly	d23a6e1	d23a6e1
torchdata	nightly	11bb5b8	11bb5b8
dynamo_benchmarks	nightly	`fea73cb`	`fea73cb`

Repro:

git clone -b test https://github.com/chuanqi129/transformers && cd transformers && \
    python setup.py bdist_wheel && pip install --force-reinstall dist/*.whl && cd ..
git clone -b test https://github.com/zxd1997066/accelerate.git && cd accelerate && \
    python setup.py bdist_wheel && pip install --no-deps --force-reinstall dist/*.whl && cd ..
pip install -r transformers/examples/pytorch/text-classification/requirements.txt
wget https://github.com/chuanqi129/inductor-tools/raw/xiangdong/accuracy/scripts/modelbench/quant/numa_launcher.py
wget https://github.com/chuanqi129/inductor-tools/raw/xiangdong/accuracy/scripts/modelbench/quant/hf_quant_test.sh
#change model in https://github.com/chuanqi129/inductor-tools/blob/xiangdong/accuracy/scripts/modelbench/quant/hf_quant_test.sh#L88
#static quantization
bash hf_quant_test.sh key torch_compile_quant_static
#dynamic quantization
bash hf_quant_test.sh key torch_compile_quant

Suspected guilty commit: 2229884
text-classification+albert-base-v1-static-quant-accuracy-crash_guilty_commit.log

cc @ezyang @anijain2305 @chauhang @penguinwu @WeizhuoZhang-intel @chuanqi129

The text was updated successfully, but these errors were encountered:

leslie-fang-intel · 2024-06-18T06:25:05Z

Hi @ezyang, could you kindly help to take a look? Prepare the script to reproduce this issue: https://gist.github.com/leslie-fang-intel/696041fa7e7352ecb985b04a5e1188de and it starts to fail since 2229884

Here are the version of transformer I used pip install "git+https://github.com/huggingface/transformers@243e186efbf7fb93328dd6b34927a4e8c8f24395" in case needed.

zxd1997066 · 2024-06-19T15:40:30Z

vision_maskrcnn and detectron2_fcos_r_50_fpn AMP/float32 single/multiple thread static/dynamic shape default/cpp wrapper meet TypeError: Invalid NaN comparison https://gist.github.com/zxd1997066/5f1fc727ced62f4ae82df88ea232f863 And they have the same guilty commit 2229884
bisect log:
torchbench-vision_maskrcnn-inference-float32-static-default-multiple-accuracy-crash_guilty_commit.log

Repro:
inductor_single_run.sh

bash inductor_single_run.sh single/multiple inference accuracy/performance torchbench vision_maskrcnn/detectron2_fcos_r_50_fpn amp/float32 first dynamic/static default/cpp

leslie-fang-intel · 2024-06-23T02:58:49Z

Running this test with TORCH_LOGS="+dynamic"
We can find the guard difference before and after this commit:

Previously, we can statically known s0 != 9223372036854775807
However, after this commit, we have to add the guard which causes the failure.

leslie-fang-intel · 2024-06-23T03:05:02Z

Further looking into the why we can't statically known s0 != 9223372036854775807 after this commit:

Before regression
- Here the upper of vr is 9223372036854775806 and offset is 1 which make add returning 9223372036854775805
  - pytorch/torch/fx/experimental/symbolic_shapes.py
    
    Line 4533 in cac6f99
    
    new_range_env[s] = SymPyValueRangeAnalysis.add(vr, -offset)
- Here comparing b.lower 9223372036854775807 and a.upper 9223372036854775806, we can known they are not equal statically .
  - pytorch/torch/utils/_sympy/value_ranges.py
    
    Line 468 in 17d1723
    
    elif a.lower > b.upper or b.lower > a.upper: # ranges disjoint
After regression:
- Here the upper of vr is int_oo and offset is 1 which make add returning int_oo
  - pytorch/torch/fx/experimental/symbolic_shapes.py
    
    Line 4533 in cac6f99
    
    new_range_env[s] = SymPyValueRangeAnalysis.add(vr, -offset)
- Here comparing b.lower 9223372036854775807 and a.upper int_oo, we can't known they are not equal statically.
  - pytorch/torch/utils/_sympy/value_ranges.py
    
    Line 468 in 17d1723
    
    elif a.lower > b.upper or b.lower > a.upper: # ranges disjoint

leslie-fang-intel · 2024-06-23T03:07:16Z

Not sure how to make the correct fix. If b.lower is larger than sys.maxsize-1 and a.upper is int_oo, can we say they are not equal in SymPyValueRangeAnalysis? cc @ezyang @lezcano

lezcano · 2024-06-24T10:19:46Z

But that guard sounds reasonable to me, no? It's asking that s0 should be representable in int64.
I'm not sure how the points above are related to the failure, and it's difficult to know without more context.

Looking at the error in #128933 (comment), it might suggest that our safe_mul is not as safe as it should be. In particular, it might be doing something like 0 * sympy.oo and it's returning a NaN. In that case, we should probably treat in that operation 0 * sympy_oo (and same with -sympy_oo) as 0, as this formula is equivalent to the limit lim_{x->inf} 0 * x = 0.

@ezyang this shows a larger issue that's lurking with the inf treatment: Our bounds are inclusive... unless one of the ends is oo, in which case they are not...

leslie-fang-intel · 2024-06-24T10:47:17Z

But that guard sounds reasonable to me, no? It's asking that s0 should be representable in int64.
I'm not sure how the points above are related to the failure, and it's difficult to know without more context.

Yean, any suggestions for how to further debug why the guard failed? I am just listing out the difference before and after this commit and maybe there is another potential issue which fails the guard :(

---------- Update for why the new added guard fail ------------

By previous debug, we will create a new guard as Ne(s0, 9223372036854775807)
And I see we will evaluate this again here which returns result of None. It actually hit the lru_cache, but I think it will follow same analysis in [inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release #128933 (comment) even it didn't hit the lru_cache.
- pytorch/torch/fx/experimental/symbolic_shapes.py
  
  Line 4107 in d21f311
  
  if self._maybe_evaluate_static(guard.expr, axioms=()) is not None:
Then we record_constraint_violation in constraint_violations here since there is a constraint with instance of StrictMinMaxConstraint
```
p constraints
{StrictMinMaxConstraint(warn_only=False, vr=VR[0, int_oo])}
```
- pytorch/torch/fx/experimental/symbolic_shapes.py
  
  Line 4090 in d21f311
  
  record_constraint_violation(c.warn_only, self._debug_name(source), msg)
When constraint_violations is not empty, we raise the error here

pytorch/torch/fx/experimental/symbolic_shapes.py

Line 4180 in d21f311

raise ConstraintViolationError(

ezyang · 2024-06-24T17:57:41Z

This is sort of expected, but what we probably can do is make the constraint violation error more tolerant for this case.

The big question I had to answer in #127693 was what I should do if there legitimately was different behavior when s0 == sys.maxsize. Previously, I simply assumed this couldn't happen, because who makes sys.maxsize type tensors. But with int_oo modeling, "just assuming" it doesn't happen is not so convenient. But it's also not a big deal, you just get a guard testing that the int is not maxsize, nbd.

Except for the constraint stuff. The constraint violation says "if there is ANY guard, error out". But we can probably make it softer, e.g., a guard that the value is not maxsize shouldn't trigger this.

leslie-fang-intel added the oncall: cpu inductor CPU Inductor issues for Intel team to triage label Jun 18, 2024

ezyang added the module: dynamic shapes label Jun 27, 2024

soulitzer added the oncall: pt2 label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release #128933

[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release #128933

zxd1997066 commented Jun 18, 2024 •

edited by pytorch-bot bot

Loading

leslie-fang-intel commented Jun 18, 2024 •

edited

Loading

zxd1997066 commented Jun 19, 2024

leslie-fang-intel commented Jun 23, 2024 •

edited

Loading

leslie-fang-intel commented Jun 23, 2024 •

edited

Loading

leslie-fang-intel commented Jun 23, 2024 •

edited

Loading

lezcano commented Jun 24, 2024

leslie-fang-intel commented Jun 24, 2024 •

edited

Loading

ezyang commented Jun 24, 2024

[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release #128933

[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release #128933

Comments

zxd1997066 commented Jun 18, 2024 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

leslie-fang-intel commented Jun 18, 2024 • edited Loading

zxd1997066 commented Jun 19, 2024

leslie-fang-intel commented Jun 23, 2024 • edited Loading

leslie-fang-intel commented Jun 23, 2024 • edited Loading

leslie-fang-intel commented Jun 23, 2024 • edited Loading

lezcano commented Jun 24, 2024

leslie-fang-intel commented Jun 24, 2024 • edited Loading

ezyang commented Jun 24, 2024

zxd1997066 commented Jun 18, 2024 •

edited by pytorch-bot bot

Loading

leslie-fang-intel commented Jun 18, 2024 •

edited

Loading

leslie-fang-intel commented Jun 23, 2024 •

edited

Loading

leslie-fang-intel commented Jun 23, 2024 •

edited

Loading

leslie-fang-intel commented Jun 23, 2024 •

edited

Loading

leslie-fang-intel commented Jun 24, 2024 •

edited

Loading