questions about tiramisu capabilities #347

rohany · 2021-06-21T19:33:00Z

Hi! I'm looking into using tiramisu to generate some leaf kernels for a project. I have two questions:

Can tiramisu generate code that has array access bounds parametrized by function arguments, similar to the LDA/LDB/LDC parameters in BLAS libraries?
How good is tiramisu's autoscheduler? When looking at benchmark code here with published performance numbers (like gemm), the schedule is both hand-written and the code itself is in some lower-level language. Can the autoscheduler generate schedules with similar performance?

rbaghdadi · 2021-06-22T04:33:50Z

Hi,

Yes Tiramisu can generate code that has array access bounds parametrized by function arguments. The loop bounds in Tiramisu can be any expression (as long as the loop bound is invariant to the loop body).

Generating the same schedule of gemm written by hand is still not supported in the autoscheduler. That schedule needs a much larger search space.

rohany · 2021-06-22T16:30:30Z

Yes Tiramisu can generate code that has array access bounds parametrized by function arguments. The loop bounds in Tiramisu can be any expression (as long as the loop bound is invariant to the loop body).

Can you point me to some example code that does this?

Generating the same schedule of gemm written by hand is still not supported in the autoscheduler. That schedule needs a much larger search space.

Makes sense. I would be planning to use it on non-blas kernels though, so is the expected performance good in those cases?

rbaghdadi · 2021-06-23T08:34:04Z

Yes, there is an example in the GEMM benchmark. The sizes are passed in SIZES and then used to initialize N, M and K: https://github.com/Tiramisu-Compiler/tiramisu/blob/master/benchmarks/linear_algebra/blas/level3/sgemm/cpu/sgemm_generator.cpp

For the autoscheduler, I think the only way to tell is to actually try it. We would be happy to learn about the results! In general, the closer the code is to randomly codes that we generate and the closer your hardware is to our hardware the more accurate the cost model is (this cost model was not trained for multiple hardware machines). We did our training on this cluster: https://groups.csail.mit.edu/commit/lanka/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about tiramisu capabilities #347

questions about tiramisu capabilities #347

rohany commented Jun 21, 2021

rbaghdadi commented Jun 22, 2021

rohany commented Jun 22, 2021

rbaghdadi commented Jun 23, 2021

questions about tiramisu capabilities #347

questions about tiramisu capabilities #347

Comments

rohany commented Jun 21, 2021

rbaghdadi commented Jun 22, 2021

rohany commented Jun 22, 2021

rbaghdadi commented Jun 23, 2021