Pulse · intel/intel-xpu-backend-for-triton · GitHub

November 23, 2024 – November 30, 2024

Overview

55 Active pull requests

39 Active issues

45 Pull requests merged by 11 people

Update .gitignore to ignore scripts_cache
#2883 merged Nov 29, 2024
Use tag pr-xyz for triton-benchmarks runs on PRs.
#2880 merged Nov 29, 2024
Merge OpenAI Triton commit cc89dac
#2876 merged Nov 29, 2024
Support for tt.dot_scaled operator
#2804 merged Nov 29, 2024
Generalize Intel coalescing pass to handle users of scf.for with coalesced load
#2856 merged Nov 29, 2024
Remove duplication of usage of XPU specific libraries in bin/CMakeLists.txt
#2874 merged Nov 29, 2024
[github-bot] Update spirv-llvm-translator.conf
#2872 merged Nov 29, 2024
[NFC] Remove CMAKE_VERBOSE_MAKEFILE var
#2871 merged Nov 29, 2024
[NFC] Remove get_event_pool from XPUUtils
#2870 merged Nov 28, 2024
Remove packaging from dependencies
#2866 merged Nov 28, 2024
[GEMM] Remove TRITON_INTEL_ENABLE_ADDRESS_PAYLOAD_OPT
#2861 merged Nov 28, 2024
Reapply "[BUILD] Some CMake cleanup/modernisation (#5271)"
#2864 merged Nov 28, 2024
Add small 2D load block size option for B.T matrix.
#2863 merged Nov 28, 2024
Change the order of bitcast op during the DPAS lowering to improve the instruction scheduling.
#2860 merged Nov 28, 2024
Improve performance of shape 1024x1024x1024 out of box
#2839 merged Nov 28, 2024
[github-bot] Update spirv-llvm-translator.conf
#2857 merged Nov 28, 2024
Merge OpenAI Triton commit 6d3ed0b
#2858 merged Nov 28, 2024
Revert some changes for Windows build since they seem unnecessary
#2847 merged Nov 27, 2024
Merge OpenAI Triton commit 9e508a4
#2848 merged Nov 27, 2024
Replace no-basekit.yml with pip-test.yml
#2851 merged Nov 27, 2024
Leave only one ruff-pre-commit hook in .pre-commit-config.yaml
#2852 merged Nov 27, 2024
Enable 08-grouped-gemm on A770
#2849 merged Nov 27, 2024
Replace "use_system_python" with "use_pyenv_python"
#2842 merged Nov 27, 2024
[XPU][TritonIntelGPUToLLVM] Add support for more transpose kinds
#2786 merged Nov 27, 2024
Make Windows build C++17 compliant
#2833 merged Nov 27, 2024
[XPU][OptRed] Revamp -tritonintelgpu-optimize-reduction-locality
#2800 merged Nov 27, 2024
Merge OpenAI Triton commit 3f1d70f
#2838 merged Nov 26, 2024
Fix test_chained_reductions
#2821 merged Nov 26, 2024
Generalize test_libdevice.py; don't import libdevice from extra.intel
#2832 merged Nov 26, 2024
Use [[maybe_unused]] to get rid of warnings while building SPIRVRunner
#2829 merged Nov 26, 2024
Fix test_scan_layouts[True-1-src_layout10-64-32] on LTS driver; ignoring loadBinary error with large registers
#2808 merged Nov 26, 2024
Return elapsed_time patch
#2828 merged Nov 26, 2024
[XPU][TritonIntelGPUToLLVM] Add support for more shuffle kinds
#2799 merged Nov 26, 2024
[NIT] Revert Allocation analysis changes
#2826 merged Nov 26, 2024
[XPU][TritonGPUToLLVM] Use llvm.func attributes to express kernels ND-ranges
#2770 merged Nov 26, 2024
Sync from upstream
#2820 merged Nov 26, 2024
Merge OpenAI Triton commit 22e212b
#2819 merged Nov 25, 2024
Update PyTorch commit id
#2818 merged Nov 25, 2024
Use find_package(Threads) instead of hardcoding
#2813 merged Nov 25, 2024
[XPU][TritonGEN] Replace split barrier ops usages with SPIR-V ops
#2814 merged Nov 25, 2024
Revert "bump triton verion to 3.1.0 (#2716)"
#2812 merged Nov 25, 2024
Workflow to test Triton with pip dependencies
#2806 merged Nov 24, 2024
Merge OpenAI Triton commit e3ab295
#2810 merged Nov 24, 2024
[github-bot] Update spirv-llvm-translator.conf
#2809 merged Nov 23, 2024
Merge OpenAI Triton commit 16ce143
#2807 merged Nov 23, 2024

10 Pull requests opened by 7 people

Fix headers path for conda run
#2816 opened Nov 25, 2024
Use order from A matrix when determining DPAS layout
#2834 opened Nov 26, 2024
[XPU][TritonGPUToLLVM] Use `reqd_work_group_size`
#2845 opened Nov 27, 2024
[XPU] Enable reduction optimization by default
#2846 opened Nov 27, 2024
[UT] Fix test_print UTs
#2867 opened Nov 28, 2024
Windows debug
#2875 opened Nov 29, 2024
[XPU] Drop `-tritonintelgpu-optimize-elementwise-locality` pass
#2877 opened Nov 29, 2024
Add `setuptools` to runtime deps
#2881 opened Nov 29, 2024
[CI][GEMM][FA] Remove default path with env vars configuration
#2882 opened Nov 29, 2024
[github-bot] Update spirv-llvm-translator.conf
#2884 opened Nov 29, 2024

26 Issues closed by 8 people

[CI] Specify `TAG` when triggering benchmarks runs on PRs
#2862 closed Nov 29, 2024
Merge OpenAI Triton till Nov 29th
#2682 closed Nov 29, 2024
Implement support for the `tt.dot_scaled` operation on XPU
#2633 closed Nov 29, 2024
Generalize Intel coalescing pass to handle users of scf.for with coalesced load
#2762 closed Nov 29, 2024
[GEMM] 2048x2048x2048 out of box performance degradation
#2733 closed Nov 29, 2024
[GEMM] 16384x8192x4096 out of box performance degradation
#2734 closed Nov 29, 2024
Consider remove `packaging` from dependencies
#2865 closed Nov 28, 2024
Reland upstream commit `2003685`
#2859 closed Nov 28, 2024
[GEMM] Improve performance of shape 1024x1024x1024 out of box
#2822 closed Nov 28, 2024
Try to enable `bandit` pre-commit check in Triton
#2844 closed Nov 27, 2024
Leave only one `ruff-pre-commit` hook in `.pre-commit-config.yaml`
#2854 closed Nov 27, 2024
Tutorial 08-grouped-gemm hangs on A770
#2737 closed Nov 27, 2024
Allow sub-group transpose and shuffles with more than one contiguous row per thread
#2749 closed Nov 27, 2024
Make Windows build C++17 compliant
#2837 closed Nov 27, 2024
Try ro port `test_inductor_cummax_bool` to Triton
#2841 closed Nov 27, 2024
Revamp `-tritonintelgpu-optimize-reduction-locality`
#2752 closed Nov 27, 2024
Try to port `test_bessel` to Triton
#2835 closed Nov 27, 2024
Generalize code in `test_gpuhello.py`
#2840 closed Nov 27, 2024
AttributeError: module 'triton' has no attribute 'jit'
#2740 closed Nov 27, 2024
[UT] Fix `test_chained_reductions[in_shape0-perm0-red_dims0]`
#2703 closed Nov 26, 2024
Fix `test_scan_layouts[True-1-src_layout10-64-32]` on LTS driver
#2789 closed Nov 26, 2024
Upstream changes for allocation
#2825 closed Nov 26, 2024
[GEMM] Improve performance of shape 8192x16384x4096 out of box
#2796 closed Nov 26, 2024
Investigate the new tensor descriptor API
#2586 closed Nov 25, 2024
Replace SplitBarrierSignalOp TritonGEN operations with SPIR-V dialect ones
#2815 closed Nov 25, 2024
Workflow to test Triton with runtime dependencies installed with pip
#2795 closed Nov 24, 2024

13 Issues opened by 7 people

Merge OpenAI Triton till Dec 13rd
#2879 opened Nov 29, 2024
[UT] Fix `test_gather`
#2878 opened Nov 29, 2024
[GEMM] 2048X2048X2048 has big variance.
#2873 opened Nov 29, 2024
Attach `opencl.kernels` metadata
#2869 opened Nov 28, 2024
[Pytorch Upstream] Triton build failed with docker image pytorch/manylinux2_28-builder:cpu
#2868 opened Nov 28, 2024
`'/debug:fastlink'` and `'/INCREMENTAL'` aren't recognized in Windows build
#2855 opened Nov 27, 2024
Drop `-tritonintelgpu-optimize-elementwise-locality` pass
#2850 opened Nov 27, 2024
SPIRVRunner: Investigate and handle multi-kernel execution (benchmark- gemm_streamk_benchmark.py)
#2831 opened Nov 26, 2024
Failed core tests on A50
#2830 opened Nov 26, 2024
Reduce changes in common files for windows support
#2824 opened Nov 26, 2024
Remove `getElemsPerThreadForOperands` from `MmaEncodingTrait`
#2823 opened Nov 26, 2024
Flaky Triton build
#2817 opened Nov 25, 2024
Reland upstream commit `340cbc6`
#2811 opened Nov 23, 2024

10 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[DOCUMENTS]Update the DPAS encoding documents.
#2746 commented on Nov 28, 2024 • 1 new comment
Insert freeze between masked loads and sdiv/srem instructions
#2775 commented on Nov 28, 2024 • 1 new comment
[UT] regression in test_subprocess.py with the PTDB 0.5.3
#800 commented on Nov 26, 2024 • 0 new comments
Classify difference between Intel port and OpenAI Triton
#2030 commented on Nov 26, 2024 • 0 new comments
Enable all tests from `test_debug.py` on XPU
#2755 commented on Nov 28, 2024 • 0 new comments
Cleanup unnecessary env variables
#1972 commented on Nov 28, 2024 • 0 new comments
Implement support for `TritonGPU::UpcastMXFPOp` for Intel XPU BE
#2678 commented on Nov 29, 2024 • 0 new comments
[GEMM] Improve shapes with performance <95% of XeTLA
#2024 commented on Nov 29, 2024 • 0 new comments
POC: Enable Proton for XPU
#2635 commented on Nov 28, 2024 • 0 new comments
Support for `tritongpu.upcast_mxfp` operation
#2700 commented on Nov 26, 2024 • 0 new comments