ktensor.full() improvements #299

ghbrown · 2024-06-21T23:54:01Z

The current implementation of .full() for ktensors is extremely memory and compute intensive. It's easy to run out of memory with tensor of modest size but high rank, since the current algorithm requires memory proportional to rank.
I believe an einsum based approach would be much more efficient.

Consider the timing results for the code below.

n = 1000:
  full tensor size: ~8 GB
  current .full() profiling: 107 s, 120 GB peak memory usage
  einsum based .full():      65 s,  8 GB peak memory usage

n = 100:  (as evidence that einsum outperforms for small tensors too)
  current .full() profiling: 0.1s
  einsum based .full():      0.06s

import pyttb
import numpy as np
import time

def einfull(w,fm_list):
    return np.einsum('r,ir,jr,kr->ijk',w,*fm_list)

n = 1000
r = 10
w = np.random.rand(r)
A = np.random.rand(n,r)
B = np.random.rand(n,r)
C = np.random.rand(n,r)
K = pyttb.ktensor()
K.weights = w
K.factor_matrices = [A,B,C]
t_0 = time.perf_counter()
T = K.full()
# T = einfull(w,[A,B,C])
t_1 = time.perf_counter()
print(t_1 - t_0)

The text was updated successfully, but these errors were encountered:

tgkolda · 2024-06-22T17:03:52Z

@ghbrown This is not being done according to the efficient version from MATLAB (aka Version 2). If you translate that algorithm, it should be much better.

ghbrown · 2024-06-25T17:49:10Z

Yes, thanks for the suggestions. I'll write a PR for this shortly, but for now here is a mock up with benchmarks.

n = 1000:
  full tensor size: ~8 GB
  current .full() profiling: 107 s, 120 GB peak memory usage
       einsum based .full(): 65 s,  8 GB peak memory usage
        split based .full(): 1.6 s,  8 GB peak memory usage

Interesting that in the case of two splits with the same memory footprint it happens to be optimal to choose the one with smaller row dimension (and so larger column dimension). I assume this because the result of the outer product is laid out in row-major order, but I'm not positive.

Outer product timings:
  (1e3 x 10) @ (10 x 1e6): 1.4 s
  (1e6 x 10) @ (10 x 1e3): 5.6 s

import pyttb
import numpy as np
import time

def einfull(w,fm_list):
    return np.einsum('r,ir,jr,kr->ijk',w,*fm_list)

def min_split_dims(dims):
    """
    solve
      min_{i in range(1,d)}  product(dims[:i]) + product(dims[i:])
    note: returning the first minimizer in the the sum_of_prods
          (argmin() default behavior) seems to be reduce the runtime
          of full(), since forming a row-vector-like outer product is
          faster than forming the transposed column-vector-like outer
          product (due to memory layout of the outer product?)
    """
    sum_of_prods = [np.prod(dims[:i])+np.prod(dims[i:]) for i in range(1,len(dims))]
    i_min = np.argmin(sum_of_prods) + 1  # note range above starts at 1
    return i_min

def full2(w,fm_list):
    dims = [fm.shape[0] for fm in fm_list]
    i_split = min_split_dims(dims)
    data = ( (pyttb.khatrirao(*fm_list[:i_split],reverse=True) * w) @
            pyttb.khatrirao(*fm_list[i_split:],reverse=True).T )
    return pyttb.tensor(data, tuple(dims), copy=False)

# benchmark short-fat vs tall-skinny outer products
A_test = np.random.rand(1000,10)
B_test = np.random.rand(1000**2,10)
w_test = np.random.rand(10)
t_0 = time.perf_counter()
C_test_1 = (A_test*w_test) @ B_test.T
t_1 = time.perf_counter()
print(f'row-vector-like time: {t_1 - t_0}')
t_0 = time.perf_counter()
C_test_2 = (B_test*w_test) @ A_test.T
t_1 = time.perf_counter()
print(f'column-vector-like time: {t_1 - t_0}')
del(A_test,B_test,C_test_1,C_test_2,w_test)

n = 1000
r = 10
w = np.random.rand(r)
A = np.random.rand(n,r)
B = np.random.rand(n,r)
C = np.random.rand(n,r)
K = pyttb.ktensor()
K.weights = w
K.factor_matrices = [A,B,C]
t_0 = time.perf_counter()
# T_full_1 = K.full()
# T_ein    = einfull(w,[A,B,C])
T_full_2 = full2(w,[A,B,C])
t_1 = time.perf_counter()
print(f'time to construct full tensor: {t_1 - t_0}')

ghbrown mentioned this issue Jun 25, 2024

Improve ktensor.full() #300

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ktensor.full() improvements #299

ktensor.full() improvements #299

ghbrown commented Jun 21, 2024 •

edited

Loading

tgkolda commented Jun 22, 2024 •

edited

Loading

ghbrown commented Jun 25, 2024 •

edited

Loading

ktensor.full() improvements #299

ktensor.full() improvements #299

Comments

ghbrown commented Jun 21, 2024 • edited Loading

tgkolda commented Jun 22, 2024 • edited Loading

ghbrown commented Jun 25, 2024 • edited Loading

ghbrown commented Jun 21, 2024 •

edited

Loading

tgkolda commented Jun 22, 2024 •

edited

Loading

ghbrown commented Jun 25, 2024 •

edited

Loading