Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase riscv implementation in DepthwiseConvKernel #127867

Closed
wants to merge 1 commit into from

Conversation

zhangfeiv0
Copy link
Contributor

@zhangfeiv0 zhangfeiv0 commented Jun 4, 2024

Summary:

Increase riscv implementation in DepthwiseConvKernel.

Compile:

export USE_CUDA=0
export USE_DISTRIBUTED=0
export USE_MKLDNN=0
export MAX_JOBS=4
export CMAKE_CXX_COMPILER=clang++
export CMAKE_C_COMPILER=clang
export CMAKE_C_FLAGS=-march=rv64gcv
export CMAKE_CXX_FLAGS=-march=rv64gcv
python3 setup.py develop --cmake

Test Plan:

Correctness - Check the results of the run before and after test_convolution.py

python3 test/run_test.py --include nn/test_convolution --keep-going

Before:
===== 9 passed, 13 skipped, 564 deselected in 46.55s =====
The following tests failed consistently:
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types
test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size
test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d
test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_complex64
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_float32

After:
===== 9 passed, 13 skipped, 564 deselected in 48.13s =====
The following tests failed consistently:
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types
test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size
test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d
test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_complex64
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_float32

Performance - Compare the results before and after mobilenet_v2

python3 run.py mobilenet_v2 -d cpu -t eval

Before:
Running eval method from mobilenet_v2 on cpu in eager mode with input batch size 16 and precision fp32.
CPU Wall Time per batch: 19590.647 milliseconds
CPU Wall Time: 19590.647 milliseconds
Time to first batch: 5271.3518 ms
CPU Peak Memory: 0.3809 GB

After:
Running eval method from mobilenet_v2 on cpu in eager mode with input batch size 16 and precision fp32.
CPU Wall Time per batch: 13523.530 milliseconds
CPU Wall Time: 13523.530 milliseconds
Time to first batch: 2696.0304 ms
CPU Peak Memory: 0.3408 GB

Versions:
Clang version: 17.0.2
Platform: CanMV-K230
Architecture: riscv64
OS: Ubuntu 23.10

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Copy link

linux-foundation-easycla bot commented Jun 4, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jun 4, 2024
Copy link

pytorch-bot bot commented Jun 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127867

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (1 Unrelated Failure)

As of commit 10f8b73 with merge base 7ef7c26 (image):

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 5, 2024
@wangfanv0
Copy link

This is our attempt at implementing RVV. Based on the code implementation, it does not negatively impact the original PyTorch build. Additionally, we observed significant performance improvements when performing inference on MobileNet_v2. With input data of [1, 3, 224, 224], we saw a 3x performance increase.
Has anyone followed up on this PR?
@mikaylagawarecki @ezyang

@ezyang
Copy link
Contributor

ezyang commented Jun 10, 2024

This would be the very first riscv kernel we'd add to PyTorch. I'm not sure if we want to commit to maintaining all of these kernels. @malfet for more comments.

@zhangfeiv0
Copy link
Contributor Author

@malfet do you have any suggestions?

@zhangfeiv0
Copy link
Contributor Author

@malfet @ezyang We are committed to continuously optimizing and maintaining the performance of PyTorch on the RISC-V architecture. We hope to have someone review our code.

@ezyang
Copy link
Contributor

ezyang commented Jun 27, 2024

cc @albanD

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not nested by CI, but sure, why not (considering that NEON code is there)
But one should probably start with vec sublibary if they are serious about RISC-V performance

@malfet
Copy link
Contributor

malfet commented Jul 1, 2024

@pytorchbot merge -f "Lint is green, the rest is not compiled"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit to khushi-411/pytorch that referenced this pull request Jul 2, 2024
**Summary:**

Increase riscv implementation in DepthwiseConvKernel.

**Compile:**

export USE_CUDA=0
export USE_DISTRIBUTED=0
export USE_MKLDNN=0
export MAX_JOBS=4
export CMAKE_CXX_COMPILER=clang++
export CMAKE_C_COMPILER=clang
export CMAKE_C_FLAGS=-march=rv64gcv
export CMAKE_CXX_FLAGS=-march=rv64gcv
python3 setup.py develop --cmake

**Test Plan:**

**Correctness** - Check the results of the run before and after test_convolution.py

python3 test/run_test.py --include nn/test_convolution --keep-going

**Before:**
===== 9 passed, 13 skipped, 564 deselected in 46.55s =====
The following tests failed consistently:
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types
test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size
test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d
test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_complex64
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_float32

**After:**
===== 9 passed, 13 skipped, 564 deselected in 48.13s =====
The following tests failed consistently:
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice
test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types
test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size
test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d
test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d
test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_complex64
test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCPU::test_conv_empty_channel_cpu_float32

**Performance** - Compare the results before and after mobilenet_v2

python3 run.py mobilenet_v2  -d cpu -t eval

**Before:**
Running eval method from mobilenet_v2 on cpu in eager mode with input batch size 16 and precision fp32.
CPU Wall Time per batch: 19590.647 milliseconds
CPU Wall Time:       19590.647 milliseconds
Time to first batch:         5271.3518 ms
CPU Peak Memory:                0.3809 GB

**After:**
Running eval method from mobilenet_v2 on cpu in eager mode with input batch size 16 and precision fp32.
CPU Wall Time per batch: 13523.530 milliseconds
CPU Wall Time:       13523.530 milliseconds
Time to first batch:         2696.0304 ms
CPU Peak Memory:                0.3408 GB

**Versions:**
Clang version: 17.0.2
Platform: CanMV-K230
Architecture: riscv64
OS: Ubuntu 23.10

Pull Request resolved: pytorch#127867
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

7 participants