Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Microbenchmark Profiling Memory Issues #597

Merged
merged 1 commit into from
Apr 19, 2024
Merged

Conversation

morgandu
Copy link
Collaborator

@morgandu morgandu commented Apr 16, 2024

Below are changes in this PR

  • allow microbenchmark with prefill lengths
  • allow microbenchmark with loop iters
  • allow benchmark prefill / generate stage only
  • delete prefill_results to that limited the batch size
  • added / move pytree util funcs

@morgandu morgandu force-pushed the mor--inference branch 5 times, most recently from 8253dbb to 3f28ee3 Compare April 18, 2024 21:15
Copy link
Collaborator

@patemotter patemotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I don't think we need the inference_utils.py file.



def main(config):
engine = maxengine.MaxEngine(config)
params = engine.load_params()
prefill_lengths = [64, 128, 256, 512, 1024]
benchmark_loop_iters = 10
prefill_lengths = [int(l) for l in config.inference_microbenchmark_prefill_lengths.split(",")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if you pass in a command line param like --inference-microbenchmark-prefill-lengths="512,1024" or something similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, example commands:
to run a single prefill length:

  inference_microbenchmark_prefill_lengths=1024

to run a single stage:

  inference_microbenchmark_stages=generate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make a file solely for inference_utils at this time. Especially since these functions are not unique to inference. I would add them to max_utils.py since they cover fairly generic usecases.

Copy link
Collaborator Author

@morgandu morgandu Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inference_utils.py is an existing file, but I can move these functions to max_utils.py

@morgandu morgandu changed the title Enable Inference mode and Fix Microbenchmark Profiling / Memory Issues Fix Microbenchmark Profiling Memory Issues Apr 18, 2024
@morgandu
Copy link
Collaborator Author

Overall LGTM, but I don't think we need the inference_utils.py file.

I am writing a batch inference, which need some common utility functions

Comment on lines +188 to +205
benchmark_results["AutoRegressive"], decode_state = ar_benchmark(
config, engine, params, decode_state, iters=benchmark_loop_iters, cache_size=cache_size, model_size=model_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For running just generate benchmark, you still need to populate kv cache to produce proper perf numbers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will still need to initialize a decode_state for generate step calculation

@@ -261,3 +261,8 @@ vertex_tensorboard_project: ""
# Region to create Vertex AI Tensorboard in for GCE, blank if running via XPK
# Vertex AI supported regions: https://cloud.google.com/vertex-ai/docs/general/locations#available-regions
vertex_tensorboard_region: ""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR, but I see a need for separate inference specific config files in future -- both base.yml and model specific config.

@morgandu morgandu force-pushed the mor--inference branch 2 times, most recently from 210d7b2 to 38831be Compare April 19, 2024 00:01
Copy link
Collaborator

@rwitten rwitten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Collaborator

@rwitten rwitten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to squash your commits.

  - allow run specified stages
  - allow run specific prefill length(s)
  - delete prefill result
  - printout prefill result

added funcs in max_utils
@copybara-service copybara-service bot merged commit 0e1c078 into main Apr 19, 2024
9 checks passed
@copybara-service copybara-service bot deleted the mor--inference branch April 19, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants