Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Relax] Segfault error when parse the Relax IR #17239

Open
Cookiee235 opened this issue Aug 3, 2024 · 4 comments
Open

[Bug] [Relax] Segfault error when parse the Relax IR #17239

Cookiee235 opened this issue Aug 3, 2024 · 4 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@Cookiee235
Copy link
Contributor

Actual behavior

[16:49:04] /software/tvm-lunder/src/runtime/logging.cc:390: TVM_LOG_DEBUG enables VLOG statements in 'ir/transform.cc' up to level 1
[16:49:04] /software/tvm-lunder/src/runtime/logging.cc:390: TVM_LOG_DEBUG enables VLOG statements in 'relay/ir/transform.cc' up to level 1
Segmentation fault (core dumped)

Steps to reproduce

from tvm.script import ir as I
from tvm.script import tir as T
from tvm.script import relax as R

@I.ir_module
class Module:
    @T.prim_func(private=True)
    def multiply_by_two(A: T.Buffer((16,), "float32")):
        for i in range(16):
            A[i] = A[i] * T.float32(2)

    @R.function
    def main(A: R.Tensor((16,), dtype="float32")) -> R.Tensor((16,), dtype="float32"):
        cls = Module
        args: R.Tuple(R.Tensor((16,), dtype="float32")) = (A,)
        gv1: R.Tensor((16,), dtype="float32") = R.call_tir_inplace(cls.multiply_by_two, args, out_sinfo=R.Tensor((16,), dtype="float32"), inplace_indices=[0])
        return gv1
m = Module

cc @Lunderberg @junrushao

@Cookiee235 Cookiee235 added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Aug 3, 2024
@Lunderberg
Copy link
Contributor

Looks like this is a combination of a couple of factors.

  1. Like R.call_tir, the arguments in R.call_tir_inplace must be an in-line relax::Tuple (See the discussion in [Unity][Transform] Handle relax.Var as call_tir args when lowering #15916 for discussion on this requirement.)
  2. The argument provided to R.call_tir_inplace is wrapped into an in-line tuple if it is not already an in-line tuple. While the normalization of R.call_tir_inplace would handle this case if the expression is generated through C++, this wrapping generates a tuple of var-to-tuple (R.tuple(args)), circumventing the normalization.
  3. The error-checking by R.call_tir_inplace can be triggered by multiple conditions (argument is not a tensor, argument doesn't have known shape, argument's known shape does not match output shape), but the error message attempts to access the argument's known shape, triggering a segfault if it doesn't actually exist.

@Lunderberg
Copy link
Contributor

I've submitted separate PR #17242 which should provide a better error message (instead of a segfault) when this occurs.

For (2), we may be able to improve it by checking isinstance(args.struct_info, TupleStructInfo) rather than isinstance(args. relax.Tuple). This way, a tuple that was defined earlier in the function wouldn't be modified, and would produce an error message at an earlier point. The normalization (which would produce the required in-line Tuple) is suppressed during TVMScript parsing, since TVMScript is frequently used for writing test cases that violate Relax assumptions. Maybe we should tie the normalization to the existing check_well_formed flag, so those tests usually disable the well-formed checks as well.

Lunderberg added a commit to Lunderberg/tvm that referenced this issue Aug 5, 2024
Prior to this commit, the different `R.call_tir*` variations would
wrap the arguments into an in-line `relax.Tuple`, if it is not
already a `relax.Tuple`.  While this allows a tensor to be passed into
these functions as a single argument (`R.call_tir(func, arg, ...)`
instead of `R.call_tir(func, [arg], ...)`), the wrapped Relax variable
may already refer to a tuple.

This use of a variable to refer to an argument tuple rather than an
in-line argument tuple is not allowed by Relax.  (See discussion on
apache#15916 for details.)  However, by
wrapping a variable `args: R.Tuple(R.Tensor, R.Tensor, ...)` into a
tuple-of-tuples, the error occurs after the expression has already
been generated, and refers to an expression `R.Tuple(R.Tuple(R.Tensor,
R.Tensor, ...))` that doesn't appear anywhere in the user's input.
This can make debugging difficult (see
apache#17239 for an example).

This commit updates the argument-handling in `R.call_tir` to only
generate an in-line `relax.Tuple` if the arguments do not already have
`relax.TupleStructInfo`.  If the argument was provided as a Relax
variable bound to a tuple of arguments, it will still produce an
error.  However, that error will occur much earlier, and will
explicitly state that the argument must be a `relax.Tuple` instead of
a `relax.Var`.
@Lunderberg
Copy link
Contributor

And #17243 should address (2) by improving the error message.

@Cookiee235
Copy link
Contributor Author

@Lunderberg Thanks for your fixing. Segfault is a dangerous behavior, fixing it with an error message is a good strategy. Also thanks for your efforts in improving the well-formed checking for relay IR which is also very meaningful.

tqchen pushed a commit that referenced this issue Aug 26, 2024
…17243)

* [Relax] Avoid wrapping TupleStructInfo into a Tuple for R.call_tir

Prior to this commit, the different `R.call_tir*` variations would
wrap the arguments into an in-line `relax.Tuple`, if it is not
already a `relax.Tuple`.  While this allows a tensor to be passed into
these functions as a single argument (`R.call_tir(func, arg, ...)`
instead of `R.call_tir(func, [arg], ...)`), the wrapped Relax variable
may already refer to a tuple.

This use of a variable to refer to an argument tuple rather than an
in-line argument tuple is not allowed by Relax.  (See discussion on
#15916 for details.)  However, by
wrapping a variable `args: R.Tuple(R.Tensor, R.Tensor, ...)` into a
tuple-of-tuples, the error occurs after the expression has already
been generated, and refers to an expression `R.Tuple(R.Tuple(R.Tensor,
R.Tensor, ...))` that doesn't appear anywhere in the user's input.
This can make debugging difficult (see
#17239 for an example).

This commit updates the argument-handling in `R.call_tir` to only
generate an in-line `relax.Tuple` if the arguments do not already have
`relax.TupleStructInfo`.  If the argument was provided as a Relax
variable bound to a tuple of arguments, it will still produce an
error.  However, that error will occur much earlier, and will
explicitly state that the argument must be a `relax.Tuple` instead of
a `relax.Var`.

* lint fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

No branches or pull requests

2 participants