Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Large Tensor] Fixed col2im op #17622

Merged
merged 2 commits into from
Feb 24, 2020

Conversation

connorgoggins
Copy link
Contributor

@connorgoggins connorgoggins commented Feb 18, 2020

Description

The col2im op was previously breaking on large tensor (dimension >= 2^32) data. With the following input:

run_performance_test(nd.col2im, run_backward=True, inputs=[{'data': (1,2**30,4), 'output_size': (2,2,1), 'kernel': (1,1,1)}], warmup=1, runs=1)

the following error was thrown:

Segmentation fault: 11

To root cause this issue, I ran the previous command in a Python script with GDB, and found that the underlying problem was in the dtype of the image index variable (index_im) within the col2im op's kernel in im2col.h. This image index variable used the int dtype when it should have been using index_t to properly handle long int indices. I switched this variable to index_t in the kernel and, after rebuilding, the previous input command displayed the correct output:

INFO:root:Begin Benchmark - col2im
INFO:root:Complete Benchmark - col2im
[{'col2im': [{'inputs': {'data': (1, 1073741824, 4), 'output_size': (2, 2, 1), 'kernel': (1, 1, 1)}, 'max_storage_mem_alloc_cpu/0': 33285996.0, 'avg_time_forward_col2im': 1290522.25, 'avg_time_backward_col2im': 1291033.5}]}]

Note: I also confirmed that, with my revisions, the op works with a large tensor (>= 2^32) output size, as the following command passes without errors.

run_performance_test(nd.col2im, run_backward=True, inputs=[{'data': (1,2**32,1), 'output_size': (1,2**32), 'kernel': (1,2**32)}], warmup=1, runs=1)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • M src/operator/nn/im2col.h
  • M tests/nightly/test_large_array.py

Comments

Tested on r5dn.24xl-ubuntu 16.04 and p2.16xl-ubuntu 16.04 with

  1. Individual op run
  2. Full OpPerf run

Results

The key difference between CPU and GPU tests was the instance type (r5dn.24xl for CPU, p2.16xl for GPU). All relevant build flags remain the same, and both were tested using CPU context.

Single operator test - col2im op (GPU)
Single operator test - col2im op (CPU)

Full OpPerf test (GPU)
Full OpPerf test (CPU)

@apeforest @access2rohit

@connorgoggins
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Feb 18, 2020
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apeforest
Copy link
Contributor

@connorgoggins Could you add a test to our nightly if it is not already there? Thanks!

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but do add the test to tests/nightly/test_large_array.py

@apeforest apeforest merged commit f9b2a63 into apache:master Feb 24, 2020
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
* Changed dtype for index_im

* Added nightly test for col2im
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants