Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about tiramisu capabilities #347

Open
rohany opened this issue Jun 21, 2021 · 3 comments
Open

questions about tiramisu capabilities #347

rohany opened this issue Jun 21, 2021 · 3 comments

Comments

@rohany
Copy link

rohany commented Jun 21, 2021

Hi! I'm looking into using tiramisu to generate some leaf kernels for a project. I have two questions:

  • Can tiramisu generate code that has array access bounds parametrized by function arguments, similar to the LDA/LDB/LDC parameters in BLAS libraries?
  • How good is tiramisu's autoscheduler? When looking at benchmark code here with published performance numbers (like gemm), the schedule is both hand-written and the code itself is in some lower-level language. Can the autoscheduler generate schedules with similar performance?
@rbaghdadi
Copy link
Collaborator

Hi,

Yes Tiramisu can generate code that has array access bounds parametrized by function arguments. The loop bounds in Tiramisu can be any expression (as long as the loop bound is invariant to the loop body).

Generating the same schedule of gemm written by hand is still not supported in the autoscheduler. That schedule needs a much larger search space.

@rohany
Copy link
Author

rohany commented Jun 22, 2021

Yes Tiramisu can generate code that has array access bounds parametrized by function arguments. The loop bounds in Tiramisu can be any expression (as long as the loop bound is invariant to the loop body).

Can you point me to some example code that does this?

Generating the same schedule of gemm written by hand is still not supported in the autoscheduler. That schedule needs a much larger search space.

Makes sense. I would be planning to use it on non-blas kernels though, so is the expected performance good in those cases?

@rbaghdadi
Copy link
Collaborator

Yes, there is an example in the GEMM benchmark. The sizes are passed in SIZES and then used to initialize N, M and K: https://github.com/Tiramisu-Compiler/tiramisu/blob/master/benchmarks/linear_algebra/blas/level3/sgemm/cpu/sgemm_generator.cpp

For the autoscheduler, I think the only way to tell is to actually try it. We would be happy to learn about the results! In general, the closer the code is to randomly codes that we generate and the closer your hardware is to our hardware the more accurate the cost model is (this cost model was not trained for multiple hardware machines). We did our training on this cluster: https://groups.csail.mit.edu/commit/lanka/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants