Skip to content

Commit

Permalink
Deploy the flash_attention operator CI on H100
Browse files Browse the repository at this point in the history
Summary:
We are deploying a few continuous benchmarking workloads on ServiceLab H100 hosts.
Flash_attention is the first one.

Reviewed By: jialiangqu

Differential Revision: D58607891

fbshipit-source-id: db9d48a62a3ea3847de44ad869c12950371dab77
  • Loading branch information
xuzhao9 authored and facebook-github-bot committed Jun 15, 2024
1 parent 339ccfd commit 5831be0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions torchbenchmark/operators/flash_attention/operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def __init__(self, mode: str, device: str, extra_args: Optional[List[str]]=None)
self.sm_scale = 1.3
self.xformers_splitk = args.xformers_splitk

@register_benchmark(baseline=True)
@register_benchmark()
def aten(
self,
q: torch.Tensor,
Expand All @@ -127,7 +127,7 @@ def _inner():

return _inner

@register_benchmark()
@register_benchmark(baseline=True)
def sdpa(
self,
q: torch.Tensor,
Expand Down

0 comments on commit 5831be0

Please sign in to comment.