[userbenchmark] Add newly_run and no_longer_run metrics to output yaml #1509

janeyx99 · 2023-03-29T20:39:58Z

This PR changes the contracts for what needs to be implemented. Previously, users must handle when the two metrics jsons do not have the same set of keys. Now, we record the mismatches in no_longer_run_in_treatment and newly_run_in_treatment and guarantee that the keys will definitely match when the jsons enter the user-defined run function.

This would output a yaml that looks like:

control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  ...
no_longer_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: 0.010405212640762329
  BERT_pytorch, Adadelta, cuda, foreach, maximize: 0.009411881134534875
  BERT_pytorch, Adagrad, cuda, (pt2) foreach, maximize: 0.03404413016202549
  ...
newly_run_in_treatment:
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable: 0.0033336214274944116
  BERT_pytorch, Adadelta, cuda, differentiable: 0.017110475042136385
  BERT_pytorch, Adagrad, cuda, (pt2) differentiable: 0.003775304475500477
  BERT_pytorch, Adagrad, cuda, differentiable: 0.007527894619852304
  BERT_pytorch, Adam, cuda, (pt2) amsgrad, maximize: 0.00928849776127291
  ...

A potential downside here is that users may WANT to handle the mismatches themselves, and this removes that knob for them. The alternative could be to allow users to just put in NaNs for the missing values and process the results from the YAML later. This would establish a different kind of contract that NaNs are used whenever the actual measurement is missing. I'm not sure which is better. The YAML would then look like:

control_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
  pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
  BERT_pytorch, Adadelta, cuda, (pt2) default:
    control: 0.009517530572008003
    treatment: 0.009517530572008003
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, default:
    control: 0.008748639142140746
    treatment: 0.008748639142140746
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) maximize:
    control: 0.010465960879810155
    treatment: 0.010465960879810155
    delta: 0.0
  BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize:
    control: 0.010405212640762329
    treatment: NaN
    delta: NaN
  ...
  BERT_pytorch, Adadelta, cuda, (pt2) differentiable:
    control: NaN
    treatment: 0.0033336214274944116
    delta: NaN
  ...

janeyx99 · 2023-03-29T20:48:57Z

regression_detector.py

 if not args.output:
 args.output = get_default_output_path(bm_name)
 # dump result to yaml file
 result_dict = asdict(result)
+ result_dict["no_longer_run_in_treatment"] = no_longer_run_metrics


happy to call this something else

facebook-github-bot · 2023-03-29T20:50:21Z

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

regression_detector.py

xuzhao9 · 2023-03-29T20:51:14Z

regression_detector.py

+
+ # Process control and treatment to include only shared keys
+ filtered_control_metrics = {}
+ no_longer_run_metrics = {}


It might be better to call it control_only_metrics and treatment_only_metrics, respectively.

facebook-github-bot · 2023-03-29T20:56:58Z

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xuzhao9

LGTM!

facebook-github-bot · 2023-03-29T21:20:13Z

@janeyx99 merged this pull request in 443c701.

Summary: This PR changes the contracts for what needs to be implemented. Previously, users must handle when the two metrics jsons do not have the same set of keys. Now, we record the mismatches in no_longer_run_in_treatment and newly_run_in_treatment and guarantee that the keys will definitely match when the jsons enter the user-defined run function. This would output a yaml that looks like: ``` control_env: pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8 treatment_env: pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8 bisection: null details: BERT_pytorch, Adadelta, cuda, (pt2) default: control: 0.009517530572008003 treatment: 0.009517530572008003 delta: 0.0 BERT_pytorch, Adadelta, cuda, default: control: 0.008748639142140746 treatment: 0.008748639142140746 delta: 0.0 BERT_pytorch, Adadelta, cuda, (pt2) maximize: control: 0.010465960879810155 treatment: 0.010465960879810155 delta: 0.0 ... no_longer_run_in_treatment: BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: 0.010405212640762329 BERT_pytorch, Adadelta, cuda, foreach, maximize: 0.009411881134534875 BERT_pytorch, Adagrad, cuda, (pt2) foreach, maximize: 0.03404413016202549 ... newly_run_in_treatment: BERT_pytorch, Adadelta, cuda, (pt2) differentiable: 0.0033336214274944116 BERT_pytorch, Adadelta, cuda, differentiable: 0.017110475042136385 BERT_pytorch, Adagrad, cuda, (pt2) differentiable: 0.003775304475500477 BERT_pytorch, Adagrad, cuda, differentiable: 0.007527894619852304 BERT_pytorch, Adam, cuda, (pt2) amsgrad, maximize: 0.00928849776127291 ... ``` A potential downside here is that users may WANT to handle the mismatches themselves, and this removes that knob for them. The alternative could be to allow users to just put in NaNs for the missing values and process the results from the YAML later. This would establish a different kind of contract that NaNs are used whenever the actual measurement is missing. I'm not sure which is better. The YAML would then look like: ``` control_env: pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8 treatment_env: pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8 bisection: null details: BERT_pytorch, Adadelta, cuda, (pt2) default: control: 0.009517530572008003 treatment: 0.009517530572008003 delta: 0.0 BERT_pytorch, Adadelta, cuda, default: control: 0.008748639142140746 treatment: 0.008748639142140746 delta: 0.0 BERT_pytorch, Adadelta, cuda, (pt2) maximize: control: 0.010465960879810155 treatment: 0.010465960879810155 delta: 0.0 BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: control: 0.010405212640762329 treatment: NaN delta: NaN ... BERT_pytorch, Adadelta, cuda, (pt2) differentiable: control: NaN treatment: 0.0033336214274944116 delta: NaN ... ``` Pull Request resolved: pytorch/benchmark#1509 Reviewed By: xuzhao9 Differential Revision: D44518353 Pulled By: janeyx99 fbshipit-source-id: d701cf886a7126f0776644cc3ba6d7150441cc66

[userbenchmark] Add newly_run and no_longer_run metrics to output yaml

87e5c5a

janeyx99 requested a review from xuzhao9 March 29, 2023 20:40

facebook-github-bot added the cla signed label Mar 29, 2023

janeyx99 commented Mar 29, 2023

View reviewed changes

xuzhao9 reviewed Mar 29, 2023

View reviewed changes

Address rename

66e969f

janeyx99 marked this pull request as ready for review March 29, 2023 20:56

xuzhao9 approved these changes Mar 29, 2023

View reviewed changes

facebook-github-bot closed this in 443c701 Mar 29, 2023

facebook-github-bot added the Merged label Mar 29, 2023

janeyx99 deleted the process-control-treatment-mismatch branch April 6, 2023 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[userbenchmark] Add newly_run and no_longer_run metrics to output yaml #1509

[userbenchmark] Add newly_run and no_longer_run metrics to output yaml #1509

janeyx99 commented Mar 29, 2023 •

edited

Loading

janeyx99 Mar 29, 2023

facebook-github-bot commented Mar 29, 2023

xuzhao9 Mar 29, 2023

facebook-github-bot commented Mar 29, 2023

xuzhao9 left a comment

facebook-github-bot commented Mar 29, 2023

[userbenchmark] Add newly_run and no_longer_run metrics to output yaml #1509

[userbenchmark] Add newly_run and no_longer_run metrics to output yaml #1509

Conversation

janeyx99 commented Mar 29, 2023 • edited Loading

janeyx99 Mar 29, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Mar 29, 2023

xuzhao9 Mar 29, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Mar 29, 2023

xuzhao9 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Mar 29, 2023

janeyx99 commented Mar 29, 2023 •

edited

Loading