Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[userbenchmark] Add newly_run and no_longer_run metrics to output yaml #1509

Closed
wants to merge 2 commits into from

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Mar 29, 2023

This PR changes the contracts for what needs to be implemented. Previously, users must handle when the two metrics jsons do not have the same set of keys. Now, we record the mismatches in no_longer_run_in_treatment and newly_run_in_treatment and guarantee that the keys will definitely match when the jsons enter the user-defined run function.

This would output a yaml that looks like:

control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  ...
no_longer_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: 0.010405212640762329
  BERT_pytorch, Adadelta, cuda, foreach, maximize: 0.009411881134534875
  BERT_pytorch, Adagrad, cuda, (pt2) foreach, maximize: 0.03404413016202549
  ...
newly_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable: 0.0033336214274944116
  BERT_pytorch, Adadelta, cuda, differentiable: 0.017110475042136385
  BERT_pytorch, Adagrad, cuda, (pt2) differentiable: 0.003775304475500477
  BERT_pytorch, Adagrad, cuda, differentiable: 0.007527894619852304
  BERT_pytorch, Adam, cuda, (pt2) amsgrad, maximize: 0.00928849776127291
  ...

A potential downside here is that users may WANT to handle the mismatches themselves, and this removes that knob for them. The alternative could be to allow users to just put in NaNs for the missing values and process the results from the YAML later. This would establish a different kind of contract that NaNs are used whenever the actual measurement is missing. I'm not sure which is better. The YAML would then look like:

control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize:
    control: 0.010405212640762329
    treatment: NaN
    delta: NaN
  ...
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable:
    control: NaN
    treatment: 0.0033336214274944116
    delta: NaN
  ...

if not args.output:
args.output = get_default_output_path(bm_name)
# dump result to yaml file
result_dict = asdict(result)
result_dict["no_longer_run_in_treatment"] = no_longer_run_metrics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to call this something else

@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

regression_detector.py Show resolved Hide resolved

# Process control and treatment to include only shared keys
filtered_control_metrics = {}
no_longer_run_metrics = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to call it control_only_metrics and treatment_only_metrics, respectively.

@janeyx99 janeyx99 marked this pull request as ready for review March 29, 2023 20:56
@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@janeyx99 merged this pull request in 443c701.

@janeyx99 janeyx99 deleted the process-control-treatment-mismatch branch April 6, 2023 16:43
gairgeio added a commit to gairgeio/benchmark that referenced this pull request Aug 2, 2024
Summary:
This PR changes the contracts for what needs to be implemented. Previously, users must handle when the two metrics jsons do not have the same set of keys. Now, we record the mismatches in no_longer_run_in_treatment and newly_run_in_treatment and guarantee that the keys will definitely match when the jsons enter the user-defined run function.

This would output a yaml that looks like:
```
control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  ...
no_longer_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: 0.010405212640762329
  BERT_pytorch, Adadelta, cuda, foreach, maximize: 0.009411881134534875
  BERT_pytorch, Adagrad, cuda, (pt2) foreach, maximize: 0.03404413016202549
  ...
newly_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable: 0.0033336214274944116
  BERT_pytorch, Adadelta, cuda, differentiable: 0.017110475042136385
  BERT_pytorch, Adagrad, cuda, (pt2) differentiable: 0.003775304475500477
  BERT_pytorch, Adagrad, cuda, differentiable: 0.007527894619852304
  BERT_pytorch, Adam, cuda, (pt2) amsgrad, maximize: 0.00928849776127291
  ...
```

A potential downside here is that users may WANT to handle the mismatches themselves, and this removes that knob for them. The alternative could be to allow users to just put in NaNs for the missing values and process the results from the YAML later. This would establish a different kind of contract that NaNs are used whenever the actual measurement is missing. I'm not sure which is better. The YAML would then look like:
```
control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize:
    control: 0.010405212640762329
    treatment: NaN
    delta: NaN
  ...
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable:
    control: NaN
    treatment: 0.0033336214274944116
    delta: NaN
  ...
```

Pull Request resolved: pytorch/benchmark#1509

Reviewed By: xuzhao9

Differential Revision: D44518353

Pulled By: janeyx99

fbshipit-source-id: d701cf886a7126f0776644cc3ba6d7150441cc66
bestappsandcodereviews7 added a commit to bestappsandcodereviews7/benchmark that referenced this pull request Aug 16, 2024
Summary:
This PR changes the contracts for what needs to be implemented. Previously, users must handle when the two metrics jsons do not have the same set of keys. Now, we record the mismatches in no_longer_run_in_treatment and newly_run_in_treatment and guarantee that the keys will definitely match when the jsons enter the user-defined run function.

This would output a yaml that looks like:
```
control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  ...
no_longer_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: 0.010405212640762329
  BERT_pytorch, Adadelta, cuda, foreach, maximize: 0.009411881134534875
  BERT_pytorch, Adagrad, cuda, (pt2) foreach, maximize: 0.03404413016202549
  ...
newly_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable: 0.0033336214274944116
  BERT_pytorch, Adadelta, cuda, differentiable: 0.017110475042136385
  BERT_pytorch, Adagrad, cuda, (pt2) differentiable: 0.003775304475500477
  BERT_pytorch, Adagrad, cuda, differentiable: 0.007527894619852304
  BERT_pytorch, Adam, cuda, (pt2) amsgrad, maximize: 0.00928849776127291
  ...
```

A potential downside here is that users may WANT to handle the mismatches themselves, and this removes that knob for them. The alternative could be to allow users to just put in NaNs for the missing values and process the results from the YAML later. This would establish a different kind of contract that NaNs are used whenever the actual measurement is missing. I'm not sure which is better. The YAML would then look like:
```
control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize:
    control: 0.010405212640762329
    treatment: NaN
    delta: NaN
  ...
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable:
    control: NaN
    treatment: 0.0033336214274944116
    delta: NaN
  ...
```

Pull Request resolved: pytorch/benchmark#1509

Reviewed By: xuzhao9

Differential Revision: D44518353

Pulled By: janeyx99

fbshipit-source-id: d701cf886a7126f0776644cc3ba6d7150441cc66
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants