Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-547] Tutorial explaining how to use the profiler #11274

Merged
merged 12 commits into from
Jun 19, 2018
Merged

[MXNET-547] Tutorial explaining how to use the profiler #11274

merged 12 commits into from
Jun 19, 2018

Conversation

indhub
Copy link
Contributor

@indhub indhub commented Jun 14, 2018

Description

Tutorial explaining how to use the profiler.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

- Add images from web-data
- Add <!--notebook-skip-line-->
- Fix image URLs
- Fix formatting of output
- Add download button.
- Hide profile_stats.png in notebook.
@indhub indhub requested a review from szha as a code owner June 14, 2018 06:03
@indhub
Copy link
Contributor Author

indhub commented Jun 14, 2018

@ThomasDelteil @thomelane @Ishitori @safrooze
Please take a look when you get time.


It is often helpful to understand what operations take how much time while running a model. This helps optimize the model to run faster. In this tutorial, we will learn how to profile MXNet models to measure their running time and memory consumption using the MXNet profiler.

## The incorrect way to profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not incorrect. You can still use wait_to_read to time the dot operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. But I don't want to suggest wait_to_read as the recommended way to measure time taken by operations. While it might work for toy problems like this,

  • it is harder to use to measure execution time of multiple operations (requires wait_to_read both before and after the measured operation in multiple places).
  • It is hard to use to measure running time of a block inside a sequence (which is common)
  • it won't work for hybrid networks.

The goal of this tutorial is to point people to a recommended way of profiling that works for almost all cases.

However I can add a note along the lines of "While it is possible to use wait_to_read() before and after an operation to get running time of an operation, it is not a scalable method to measure running time of multiple operations, especially in a Sequential or Hybrid network"


Check [this](http:https://mxnet.incubator.apache.org/install/index.html?device=Linux&language=Python&processor=CPU) page for more information on building from source for various environments.

After building with `USE_PROFILER=True` and installing, you can import the profiler and configure it from Python code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USE_PROFILER=True or USE_PROFILER=1 or it doesn't matter?

To use the profiler, you need to build MXNet with `USE_PROFILER` enabled. For example, this command will build the CPU version of MXNet on Linux,

```
make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_PROFILER=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to provide GPU version as well, as I assume many people would want to use it.

Let's define a method that will run one training iteration given data and label.

```python
# Use GPU is available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? "Use GPU if available"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am not sure that it is safe to include this line. Before you compile CPU version of mxnet and here you actually may end up using gpu() and it will fail.

You can also dump the information collected by the profiler into a `json` file using the `profiler.dump()` function and view it in a browser.

```python
profiler.dump()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the difference between getting a plain text version vs. json is in calling "dumps()" vs "dump()"? Is it possible to change this signature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the s in dumps indicates the method returns a string. Like pickle.dumps() or json.dumps()

Link to installation page is sufficient.

```python
# Use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API has a bug. It doesn't detect GPUs on Windows (due to usage of nvidia-smi command that may not exist in Windows). I have Github issue for this. Pasting here as FYI.

Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Indu for your contributions. Overall, LGTM. I have few follow up questions before we merge the changes:

  1. Do you want to talk about Environment variables? Mainly, MXNET_EXEC_BULK_EXEC_TRAIN environment variable was very useful to me to profile independent smallest operations. Without this, profiler outputs are for fused operators.
  2. What is the plan for exisiting docs on profiler, do you want to link this tutorial - https://mxnet.incubator.apache.org/faq/perf.html
  3. Will it help to have the profiler output image for this example in the tutorial, to make it fully self-contained tutorial from objective to end result?

@indhub
Copy link
Contributor Author

indhub commented Jun 18, 2018

@sandeep-krishnamurthy. Thanks for the valuable inputs.

  1. I've added a section on the environment variables.
  2. I've linked to the perf faq.
  3. The tutorial already has an image to show how profiler output looks like. If you want to generate the visualization that chrome generates inside Jupyter notebook, we need a library that can do that and I'm not aware of any. Note also that Chrome has ways to navigate the trace output (like zooming into specific timeframe). Unless there is a way to do all that from Jupyter notebook, I would prefer recommending Chrome trace viewer as the preferred way to view tracing information.

Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks.
Build failed for some other flaky test. Can you please restart, we cannot merge without Green builds.

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit 00681c3 into apache:master Jun 19, 2018
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* Add first draft of profiler tutorial

* Minor changes

- Add images from web-data
- Add <!--notebook-skip-line-->

* Language corrections

* Minor changes

- Fix image URLs
- Fix formatting of output

* Minor changes

- Add download button.
- Hide profile_stats.png in notebook.

* Add tutorial to index.

* Add tutorial to tests.

* Add a note about nd.waitall()

* Remove the example build command.

Link to installation page is sufficient.

* Fix typo

* Include info about env variables related to profiling

* Add a further reading section
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* Add first draft of profiler tutorial

* Minor changes

- Add images from web-data
- Add <!--notebook-skip-line-->

* Language corrections

* Minor changes

- Fix image URLs
- Fix formatting of output

* Minor changes

- Add download button.
- Hide profile_stats.png in notebook.

* Add tutorial to index.

* Add tutorial to tests.

* Add a note about nd.waitall()

* Remove the example build command.

Link to installation page is sufficient.

* Fix typo

* Include info about env variables related to profiling

* Add a further reading section
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants