Fix numerical instability in vector_norm when receiving large size tensor #123416

maybeLee · 2024-04-05T03:28:59Z

I made this pull request because I encountered the numerical instability of the API torch.linalg.vector_norm when I used it to process a tensor with large shape such as ((1, 32, 224, 224, 160)).

Here is the code to reproduce the issue:

import numpy as np
import torch
x1 = np.ones((1, 32, 224, 224, 160))
ord = 2
print(np.size(x1))  # 256901120
res1 = torch.linalg.vector_norm(torch.tensor(x1, dtype=torch.float32), ord=ord)
res2 = torch.linalg.vector_norm(torch.tensor(x1, dtype=torch.float64), ord=ord)

print(res1, res2)  # tensor(11585.2373) tensor(16028.1353, dtype=torch.float64)
print(f"Expected result: {np.sqrt(np.size(x1))}")  # 16028.135262718493

When checking the code, I find that the issue may lie here

pytorch/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp

Line 181 in 3d20cc1

acc_vec += data_vec * data_vec;

Where precision loss occurs during the accumulation of acc_vec when the acc_vec is too large while data_vec * data_vec is relatively small.

The fix I am applying is to use the Kahan summation algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm).

Please note that after this fix is applied, the average execution time is changed from 0.08817670345306397 to 0.15041379928588866.

Detailed Code

import numpy as np import time import torch

x1 = np.ones((1, 32, 224, 224, 160))

ord = 2
time_list = []
for i in range(10):
s_time = time.time()
res1 = torch.linalg.vector_norm(torch.tensor(x1, dtype=torch.float32), ord=ord)
time_list.append(time.time() - s_time)
print(np.mean(time_list), np.std(time_list))

Please let me know if the fix can be improved.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

…nsor

pytorch-bot · 2024-04-05T03:29:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123416

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures

As of commit e55d019 with merge base 16cb5d4 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5, linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_bf16_dynamic_shapes_cpu
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_bf16_dynamic_shapes_cpu
pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.12-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_linalg.py::TestLinalgCPU::test_norm_bfloat16_and_half_cpu_bfloat16
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, linux.4xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_dist_bf16_cpu
pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge) (gh)
test_reductions.py::TestReductionsCPU::test_dim_reduction_lastdim_cpu_bfloat16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-04-05T03:29:03Z

❌ - login: @maybeLee . The commit (e55d019) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

CaoE · 2024-04-23T05:52:10Z

@maybeLee Thanks for your fix. If we perform group reduction on such large size cases, accuracy should be improved without introducing too much overhead.
For example (pseudocode):

acc_t acc_buffer[group_size]:
for (int64_t g = 0; g <  group_size; g++) {
  // Do reduction for each group
  acc_buffer[g] = group_reduce(...)

}

// Do reduction finally
double acc_value = reduce_finaliy(acc_buffer)
result = scalar_t(std::sqrt(acc_value);

maybeLee · 2024-04-23T07:01:36Z

Hi @CaoE , thanks for your reply. I think the original code has already done such group reduction?

pytorch/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp

Lines 246 to 253 in 7b6e354

 for (; d < size - (size % Vec::size()); d += Vec::size()) { 

 Vec data_vec = Vec::loadu(self_data + d); 

 norm_two_reduce_step(acc_vec, data_vec); 

 } 

 acc_vec.store(buffer); 

 for (int j = 1; j < fVec::size(); j++) { 

 buffer[0] = buffer[0] + buffer[j]; 

 }

CaoE · 2024-04-23T08:21:35Z

@maybeLee This is also a group reduce, and it is divided by vec size. But even if the vec size is 32, the size of each group is also large: 256901120/32=8028160. We can set a parameter, e.g., group_size = 32768, and further group the reduction externally by group_size (32768 elements are reduced in the current way in each group).

github-actions · 2024-06-22T08:34:16Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Fix numerical instability in vector_norm when receiving large size te…

e55d019

…nsor

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Apr 5, 2024

pytorchbot added the open source label Apr 5, 2024

lengstrom mentioned this pull request Apr 9, 2024

vector norm is drastically different for different data types #123645

Open

ezyang requested a review from mingfeima April 10, 2024 11:36

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 10, 2024

mingfeima requested a review from CaoE April 11, 2024 01:38

github-actions bot added the Stale label Jun 22, 2024

github-actions bot closed this Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix numerical instability in vector_norm when receiving large size tensor #123416

Fix numerical instability in vector_norm when receiving large size tensor #123416

maybeLee commented Apr 5, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 5, 2024 •

edited

Loading

linux-foundation-easycla bot commented Apr 5, 2024

CaoE commented Apr 23, 2024

maybeLee commented Apr 23, 2024

CaoE commented Apr 23, 2024

github-actions bot commented Jun 22, 2024

Fix numerical instability in vector_norm when receiving large size tensor #123416

Fix numerical instability in vector_norm when receiving large size tensor #123416

Conversation

maybeLee commented Apr 5, 2024 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Apr 5, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123416

❌ 13 New Failures

linux-foundation-easycla bot commented Apr 5, 2024

CaoE commented Apr 23, 2024

maybeLee commented Apr 23, 2024

CaoE commented Apr 23, 2024

github-actions bot commented Jun 22, 2024

maybeLee commented Apr 5, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 5, 2024 •

edited

Loading