-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ktensor.full() improvements #299
Comments
@ghbrown This is not being done according to the efficient version from MATLAB (aka Version 2). If you translate that algorithm, it should be much better. |
Yes, thanks for the suggestions. I'll write a PR for this shortly, but for now here is a mock up with benchmarks.
Interesting that in the case of two splits with the same memory footprint it happens to be optimal to choose the one with smaller row dimension (and so larger column dimension). I assume this because the result of the outer product is laid out in row-major order, but I'm not positive.
import pyttb
import numpy as np
import time
def einfull(w,fm_list):
return np.einsum('r,ir,jr,kr->ijk',w,*fm_list)
def min_split_dims(dims):
"""
solve
min_{i in range(1,d)} product(dims[:i]) + product(dims[i:])
note: returning the first minimizer in the the sum_of_prods
(argmin() default behavior) seems to be reduce the runtime
of full(), since forming a row-vector-like outer product is
faster than forming the transposed column-vector-like outer
product (due to memory layout of the outer product?)
"""
sum_of_prods = [np.prod(dims[:i])+np.prod(dims[i:]) for i in range(1,len(dims))]
i_min = np.argmin(sum_of_prods) + 1 # note range above starts at 1
return i_min
def full2(w,fm_list):
dims = [fm.shape[0] for fm in fm_list]
i_split = min_split_dims(dims)
data = ( (pyttb.khatrirao(*fm_list[:i_split],reverse=True) * w) @
pyttb.khatrirao(*fm_list[i_split:],reverse=True).T )
return pyttb.tensor(data, tuple(dims), copy=False)
# benchmark short-fat vs tall-skinny outer products
A_test = np.random.rand(1000,10)
B_test = np.random.rand(1000**2,10)
w_test = np.random.rand(10)
t_0 = time.perf_counter()
C_test_1 = (A_test*w_test) @ B_test.T
t_1 = time.perf_counter()
print(f'row-vector-like time: {t_1 - t_0}')
t_0 = time.perf_counter()
C_test_2 = (B_test*w_test) @ A_test.T
t_1 = time.perf_counter()
print(f'column-vector-like time: {t_1 - t_0}')
del(A_test,B_test,C_test_1,C_test_2,w_test)
n = 1000
r = 10
w = np.random.rand(r)
A = np.random.rand(n,r)
B = np.random.rand(n,r)
C = np.random.rand(n,r)
K = pyttb.ktensor()
K.weights = w
K.factor_matrices = [A,B,C]
t_0 = time.perf_counter()
# T_full_1 = K.full()
# T_ein = einfull(w,[A,B,C])
T_full_2 = full2(w,[A,B,C])
t_1 = time.perf_counter()
print(f'time to construct full tensor: {t_1 - t_0}') |
The current implementation of
.full()
for ktensors is extremely memory and compute intensive. It's easy to run out of memory with tensor of modest size but high rank, since the current algorithm requires memory proportional to rank.I believe an
einsum
based approach would be much more efficient.Consider the timing results for the code below.
The text was updated successfully, but these errors were encountered: