Deploy the flash_attention operator CI on H100

Summary: We are deploying a few continuous benchmarking workloads on ServiceLab H100 hosts. Flash_attention is the first one. Reviewed By: jialiangqu Differential Revision: D58607891 fbshipit-source-id: db9d48a62a3ea3847de44ad869c12950371dab77
pytorch · Jun 15, 2024 · 5831be0 · 5831be0
1 parent 339ccfd
commit 5831be0
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/torchbenchmark/operators/flash_attention/operator.py b/torchbenchmark/operators/flash_attention/operator.py
@@ -108,7 +108,7 @@ def __init__(self, mode: str, device: str, extra_args: Optional[List[str]]=None)
  self.sm_scale = 1.3
  self.xformers_splitk = args.xformers_splitk
 
- @register_benchmark(baseline=True)
+ @register_benchmark()
  def aten(
  self,
  q: torch.Tensor,
@@ -127,7 +127,7 @@ def _inner():
 
  return _inner
 
- @register_benchmark()
+ @register_benchmark(baseline=True)
  def sdpa(
  self,
  q: torch.Tensor,