Success stories using Scalene's optimization suggestions #554

emeryberger · 2023-01-23T16:30:25Z

This thread is meant to document and collect successful optimizations suggested by Scalene (vs. the more general success stories given here: #58).

emeryberger · 2023-01-23T16:37:21Z

Speedup of 2x, memory consumption dropped by 1/2

Issue #31 includes the following code, which we turn into a function:

def main1():
    x = np.array(range(10**7))
    y = np.array(np.random.uniform(0, 100, size=10**8))

Scalene (2c18f59) suggests the following optimization for this function:

# Proposed optimization:
# Vectorized operations are used to reduce the time and memory complexity of the code.
def main1():
    # After optimization
    x = np.arange(10**7) # np.arange is faster than np.array
    y = np.random.uniform(0, 100, size=10**8) # No need to wrap in np.array

This optimization speeds the code by almost 2x, and halves memory consumption (from 1.5GB to 773MB).

emeryberger · 2023-01-23T17:39:15Z

90x speedup

#58 (comment) presents the following code:

for i in range(n_features):
    for n in range(n_samples):
        subgrad[i] += (- y[n] * X[n][i]) if y[n] * (np.dot(X[n], w) + b) < 1 else 0
    subgrad[i] += self.lambda1 * (-1 if w[i] < 0 else 1) + 2 * self.lambda2 * w[i]

Scalene proposes the following optimization:

# Vectorized operations to replace for loops                                                                                                    
subgrad[:-1] = np.sum(-y[:, None] * X * (y * (X.dot(w) + b) < 1)[:, None], axis=0)
subgrad[:-1] += self.lambda1 * np.sign(w) + 2 * self.lambda2 * w
subgrad[-1] = np.sum(-y * (y * (X.dot(w) + b) < 1))

Scalene's proposed optimization accelerates the original code by at least 90x (89 seconds to 1 second, when running 500 iterations), and takes full advantage of multiple cores.

Original code

# original code
    def subgradient(self, wb, X, y):
        """Compute the subgradient of the objective function.                                                                                           
                                                                                                                                                        
        Arguments:                                                                                                                                      
            wb (ndarray, shape = (n_features+1,)):                                                                                                      
                concatenation of the weight vector with the bias wb=[w,b]                                                                               
            X (ndarray, shape = (n_samples, n_features)):                                                                                               
                Training input matrix where each row is a feature vector.                                                                               
                The data in X are passed in without a bias column!                                                                                      
            y (ndarray, shape = (n_samples,)):                                                                                                          
                Training target. Each entry is either -1 or 1.                                                                                          
                                                                                                                                                        
        Returns:                                                                                                                                        
            subgrad (ndarray, shape = (n_features+1,)):                                                                                                 
                subgradient of the objective function with respect to                                                                                   
                the coefficients wb=[w,b] of the linear model                                                                                           
        """
        n_samples = X.shape[0]
        n_features = X.shape[1]

        w = wb[:-1]
        b = wb[-1]

        subgrad = np.zeros(n_features + 1)
        for i in range(n_features):
            for n in range(n_samples):
                subgrad[i] += (- y[n] * X[n][i]) if y[n] * (np.dot(X[n], w) + b) < 1 else 0
            subgrad[i] += self.lambda1 * (-1 if w[i] < 0 else 1) + 2 * self.lambda2 * w[i]

        for n in range(n_samples):
            subgrad[-1] += - y[n] if y[n] * (np.dot(X[n], w) + b) < 1 else 0

        return subgrad

Optimized code

# Proposed optimization:                                                                                                                                
# This code has been optimized by replacing the for loops with vectorized operations. This reduces the time complexity from O(n^2) to O(n), resulting in a substantial speedup.                                                                                                                                
    def subgradient(self, wb, X, y):
        """Compute the subgradient of the objective function.                                                                                           
        Arguments:                                                                                                                                      
            wb (ndarray, shape = (n_features+1,)):                                                                                                      
                concatenation of the weight vector with the bias wb=[w,b]                                                                               
            X (ndarray, shape = (n_samples, n_features)):                                                                                               
                Training input matrix where each row is a feature vector.                                                                               
                The data in X are passed in without a bias column!                                                                                      
            y (ndarray, shape = (n_samples,)):                                                                                                          
                Training target. Each entry is either -1 or 1.                                                                                          
        Returns:                                                                                                                                        
            subgrad (ndarray, shape = (n_features+1,)):                                                                                                 
                subgradient of the objective function with respect to                                                                                   
                the coefficients wb=[w,b] of the linear model                                                                                           
        """
        n_samples = X.shape[0]
        n_features = X.shape[1]
        w = wb[:-1]
        b = wb[-1]
        # Vectorized operations to replace for loops                                                                                                    
        subgrad = np.zeros(n_features + 1)
        subgrad[:-1] = np.sum(-y[:, None] * X * (y * (X.dot(w) + b) < 1)[:, None], axis=0)
        subgrad[:-1] += self.lambda1 * np.sign(w) + 2 * self.lambda2 * w
        subgrad[-1] = np.sum(-y * (y * (X.dot(w) + b) < 1))
        return subgrad

emeryberger · 2023-01-24T03:39:56Z

18x speedup over original; 1.8x speedup over manual optimization

#58 (comment) presents the following code (with some edits to put the bottleneck code into a function do_it():

column_names_example = [i for i in range(10000)]
index = pd.MultiIndex.from_tuples([("left", c) for c in column_names_example] + [("right", c) for c in column_names_example])
df = pd.DataFrame(np.random.rand(1000, 20000), columns=index)

def keep_column(left_col, right_col):
  return left_col[left_col.first_valid_index()] > right_col[right_col.last_valid_index()]

def do_it():
  v = [c for c in column_names_example if keep_column(df["left"][c], df["right"][c])]

That code takes about 6.54 seconds on a Mac Powerbook M1.

After about a dozen attempts (primarily including ones that did not run), Scalene produced a solid optimization:

# Proposed optimization: Replaced list comprehension with vectorized operation. 
def do_it():
    v = df["left"].loc[df["left"].first_valid_index()] > df["right"].loc[df["right"].last_valid_index()]
    return v.index[v].tolist()

That code runs in 0.37 seconds, corresponding to an almost 18x increase in performance. The author of the comment had previously produced the following optimization:

def do_it():
    df_l = df["left"]
    df_r = df["right"]
    return [c for c in column_names_example if keep_column(df_l[c], df_r[c])]

That code takes 0.67 seconds, making Scalene's optimization roughly 1.8x faster.

ianozsvald · 2024-07-12T15:07:49Z

@emeryberger you may be happy to hear that I've added Scalene to the 3rd edition of our High Performance Python book in the Profiling chapter, it is due to publication in 2025. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Success stories using Scalene's optimization suggestions #554

Success stories using Scalene's optimization suggestions #554

emeryberger commented Jan 23, 2023

emeryberger commented Jan 23, 2023 •

edited

Loading

emeryberger commented Jan 23, 2023 •

edited

Loading

emeryberger commented Jan 24, 2023 •

edited

Loading

ianozsvald commented Jul 12, 2024

Success stories using Scalene's optimization suggestions #554

Success stories using Scalene's optimization suggestions #554

Comments

emeryberger commented Jan 23, 2023

emeryberger commented Jan 23, 2023 • edited Loading

Speedup of 2x, memory consumption dropped by 1/2

emeryberger commented Jan 23, 2023 • edited Loading

90x speedup

emeryberger commented Jan 24, 2023 • edited Loading

18x speedup over original; 1.8x speedup over manual optimization

ianozsvald commented Jul 12, 2024

emeryberger commented Jan 23, 2023 •

edited

Loading

emeryberger commented Jan 23, 2023 •

edited

Loading

emeryberger commented Jan 24, 2023 •

edited

Loading