Don't emit atomics for thread-safe assignment patterns #401

Infinoid · 2021-02-10T15:27:15Z

This attempts to fix #316.

It prevents a #pragma omp atomic from being emitted for the assignment to y_vals[i] in this case:

$ bin/taco 'y(i) = A(i, j) * x(j)' -f=A:ds -s='parallelize(i, CPUThread, Atomics)'
// Generated by the Tensor Algebra Compiler (tensor-compiler.org)

int compute(taco_tensor_t *y, taco_tensor_t *A, taco_tensor_t *x) {
  int y1_dimension = (int)(y->dimensions[0]);
  double* restrict y_vals = (double*)(y->vals);
  int A1_dimension = (int)(A->dimensions[0]);
  int* restrict A2_pos = (int*)(A->indices[1][0]);
  int* restrict A2_crd = (int*)(A->indices[1][1]);
  double* restrict A_vals = (double*)(A->vals);
  int x1_dimension = (int)(x->dimensions[0]);
  double* restrict x_vals = (double*)(x->vals);

  #pragma omp parallel for schedule(runtime)
  for (int32_t i = 0; i < A1_dimension; i++) {
    double tjy_val = 0.0;
    for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
      int32_t j = A2_crd[jA];
      tjy_val += A_vals[jA] * x_vals[j];
    }
    y_vals[i] = tjy_val;
  }
  return 0;
}

Because each thread gets its own value of i, no other thread will assign to that location, and recent compilers treat an atomic pragma as invalid there. See #316 for details.

Are there any other cases this doesn't cover? All tests pass for me when configuring with -DOPENMP=ON -DPYTHON=ON.

Fixes tensor-compiler#316

stephenchouca · 2021-02-11T21:28:07Z

This patch seems to generate incorrect code for expressions like 'y(j) = A(i,j,k) * x(k)' -s='parallelize(j, CPUThread, Atomics)', which still require atomics even after scalar promotion (the generate code omits the required atomic pragma).

More fundamentally though, I wonder if we should instead just fix the lowerer so that it won't emit unnecessary atomic pragmas for (non-reduction) assignments. That way, the bug would be fixed even if, for instance, a user decides to manually invoke the lowerer with concrete index notation that has an assignment inside a Forall statement synchronized with atomics. This should only require modifying a few lines in lowerAssignment; I can take a stab at it if you'd like.

Infinoid · 2021-02-11T22:33:07Z

This should only require modifying a few lines in lowerAssignment; I can take a stab at it if you'd like.

Sure, please do. And I'll happily add your example as a test case.

stephenchouca · 2021-02-11T23:08:04Z

Sure, please do. And I'll happily add your example as a test case.

That'd be great, thanks!

Infinoid · 2021-02-12T13:42:16Z

Your patch works, I added the test case in #402. Thanks!

Canceling PR in favor of 0a48d17.

Don't emit atomics for thread-safe assignment patterns

5e5ef41

Fixes tensor-compiler#316

Infinoid mentioned this pull request Feb 12, 2021

Add a SpTV+openmp+atomics test case for #316 #402

Merged

Infinoid closed this Feb 12, 2021

Infinoid deleted the fix-316 branch February 12, 2021 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't emit atomics for thread-safe assignment patterns #401

Don't emit atomics for thread-safe assignment patterns #401

Infinoid commented Feb 10, 2021

stephenchouca commented Feb 11, 2021

Infinoid commented Feb 11, 2021

stephenchouca commented Feb 11, 2021

Infinoid commented Feb 12, 2021

Don't emit atomics for thread-safe assignment patterns #401

Don't emit atomics for thread-safe assignment patterns #401

Conversation

Infinoid commented Feb 10, 2021

stephenchouca commented Feb 11, 2021

Infinoid commented Feb 11, 2021

stephenchouca commented Feb 11, 2021

Infinoid commented Feb 12, 2021